End of Sequence Read Archive (SRA) – some quick notes

Well, it seems that the Sequence Read Archive (SRA) is going away sometime in the near future.  I posted about the SRA last week and in the discussion someone posted an email message that supposedly was from David Lipman of the NCBI saying that the SRA is going to be closing.   This has now been confirmed and I thought I would just post some links discussing this

Though I generally love NCBI, the Sequence/Short Read Archive (SRA) seems to have issues; what do others think?

Well, here goes. Hope to not get people from NCBI too pissed off here. Overall, I think NCBI is invaluable: GenBank. PubMed. PubMed Central (PMC) (well, I have some complaints about that but let’s not get into those here — I still like it), BLAST (Basic Local Alignment Search Tool) and a plethora of other tools, databases and resources. Generally, money well spent.

However, one database from NCBI is driving me a bit wacky these days. This is the Sequence Read Archive (SRA). Known to some as the “Short Read Archive” this database is supposedly for storing “sequencing data from the next generation of sequencing platforms including Roche 454 GS System®, Illumina Genome Analyzer®, Life Technologies AB SOLiD System® , Helicos Biosciences Heliscope®;, Complete Genomics®, and Pacific Biosciences SMRT®.”

It certainly seems to be used for that function. But alas, storing sequence is not the only need here. Recovering sequence and making use of it is really the key. And this is the area I have been having trouble with (especially related to environmental studies like rRNA PCR and metagenomics). Rather than go on about my particular issues here (and thus possibly biasing the discussion too much), I am wondering what others think of the SRA? Usability? Ease of deposition? Ease of extraction? Missing features? Things it does or does not do well? Do we need a new system for environmental projects?

Any and all comments welcome here or on twitter or on Friendfeed or wherever. See Friendfeed stream below:

http://friendfeed.com/treeoflife/4f09e201/though-i-generally-love-ncbi-sequence-short?embed=1

Here are some comments so far from twitter

  • digitalbio Sandra Porter I agree. RT @phylogenomics: Though I generally love NCBI, the Sequence/Short Read Archive (SRA) seems to hav… (cont) http://deck.ly/~XM75A
  • lswenson Luke Swenson @phylogenomics I was JUST trying to navigate the SRA! There’s no help section to be found, and forget about depositing sequences!
  • audyyy Davis-Richardson @phylogenomics I can never tell if my submission went through without emailing support. Also, no FASTQ support?
  • cabbageRed Rich C .@phylogenomics I agree, the SRA doesn’t seem to be the easiest repository to search with what I believe to be “typical” NGS queries