And two more coming. So proud of the undergrads in my lab who did this work and David Coil for coordinating it with help from Jenna Lang and Aaron Darling. Undergrads at UC Davis sequencing genomes of organisms they isolated. So cool.
Congratulations to all involved especially Jonathan Lo and Zack Bendiks, the undergrads who are first authors, and to David Coil who coordinated all the work.
In summary, the point of the project was to (1) start generating some reference genomes for microbes from the built environment and (2) to engage undergraduates at UC Davis in genome sequencing and microbiology of the built environment projects.
The papers are published in a new open access journal from the American Society for Microbiology called “Genome Announcements”.
Thanks also to the Alfred P. Sloan Foundation which funds microBEnet and to the UC Davis Genome Center DNA Technologies Core facility which ran the sequencing. More papers are coming. Stay tuned.
Asked how mature the field of genome assembly is, Ian Korf at the University of California, Davis, compares it to a teenager with great capabilities. “It’s got bold assertions about what it can do, but at the same time it’s making embarrassing mistakes,” he says
This is just ridiculous. Nature Publishing Group in 2007 announced that they were making all papers in their journals that reported genome sequences would be made freely available and would be given a Creative Commons license: Shared genomes : Article : Nature.
About a year ago I posted to twitter (using the hashtag #opengate) and my blog about how Nature Publishing Group was not following through on their promises. See for example
Amazingly, and pleasantly, I note, in my complaining I exacted some responses from people from Nature Publishing Group who swore that these were just oversights and they would fix them. Well, alas, the money collecting machine of Nature Publishing Group is back.
For example, currently the following papers are not freely available even though at one point they were or they clearly fit in the “Shared genomes” definition Nature Publishing Group so happily promotes:
These above are all papers of mine, so I noticed them first (I noticed this when trying to create a Pintarest Baord for all my papers and not being able to get to a free page for these papers meant I couldn’t add them to the Board. Could it be that Nature Publishing Group is just trying to get my goat? Let’s see. A brief search found these papers by others – all also not freely available even though all clearly fit Nature’s own definition of genome sequencing papers:
I think the funniest (and scariest) part may be the corrections and errata that are not freely available. And these are just the articles I found in a 15 minute search. I am sure there are more. Yes, Nature Publishing Group has made many genome papers freely available. That is great. Much better than many other publishers. But the cracks in your system are large and suggest that nobody there is actually dedicated to seeing through on the promises. Promises are meaningless. Follow through is the key. Come on Nature Publishing Group – how about assigning a “Free access ombudsman” or something like that who will make sure that free means free. I am sick of writing these posts. You should do your own QC …
UPDATE: see some more recent blog posts of mine about this topic:
UPDATE 3-28-12 1 PM PST: Well, if you look at the comments, Nature is apparently trying to fix this and most of the articles I listed above are now freely available (the corrections are still not free but they claim to be working on it). But a simple search of Nature finds there are still some papers that are closed off that shouldn’t be:
It’s not that hard to find these. It baffles me a bit how people at Nature don’t seem to be able to find them. But maybe I am just really good at searching …
It discusses the development of international annotation standards at NCBI (The National Center for Biotechnology Information) in collaboration with others. Note – the paper is Open Access.
Their abstract:
The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries.
The paper refers extensively to workshops held by NCBI on genome annotation and gives a link to a page from NCBI with additional information about these workshops.
Now – never mind the extensive use of the term prokaryote in the paper … the paper has got a wealth of information and tidbits worth checking out.
Among the other sections worth checking out * Discussion of pseudogene annotation and identification * Discussion of variation in structural annotation * Evidence standards * Functional annotation and naming guidelines
For anyone interested in annotating a genome – and more and more people are these days with the decrease in sequencing costs – this is a must read.
Just a quick post here. A new paper from my lab has come out in Bioinformatics. The paper is relatively simple. Titled “Mauve Assembly Metrics” it reports work of Aaron Darling and Andrew Tritt (with some minor contributions from me and Marc Facciotti). Aaron wrote the program Mauve when he was a student in Nicole Perna’s lab at Wisconsin: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Over the years he (and others) have continued to develop the program and written a few papers too including for example, the development of progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. This new paper reports basically a system/scripts to measure assembly quality. Here is the abstract:
High throughput DNA sequencing technologies have spurred the development of numerous novel methods for genome assembly. With few exceptions, these algorithms are heuristic and require one or more parameters to be manually set by the user. One approach to parameter tuning involves assembling data from an organism with an available high quality reference genome, and measuring assembly accuracy using some metrics. We developed a system to measure assembly quality under several scoring metrics, and to compare assembly quality across a variety of assemblers, sequence data types, and parameter choices. When used in conjunction with training data such as a high quality reference genome and sequence reads from the same organism, our program can be used to manually identify an optimal sequencing and assembly strategy for de novo sequencing of related organisms.
Note this paper was supported by a grant from the National Science Foundation (ER 0949453). That grant is focused on comparative genomics (sequencing and analysis) of halophlic archaea. Stay tuned for more on that project as we are writing up a series of papers ….