Crosspost: Woohoo – two more genome announcement papers from our undergraduate project on built environment reference genomes

Crossposting this from the microBEnet blog.
Two new papers out from the microBEnet Undergraduate Research: Built Environment Reference Genomes  project:
These go with two previously published ones:
And two more coming. So proud of the undergrads in my lab who did this work and David Coil for coordinating it with help from Jenna Lang and Aaron Darling.  Undergrads at UC Davis sequencing genomes of organisms they isolated. So cool.

Crosspost: New papers from our undergraduate “microbiology of the built environment” genome sequencing project

Crossposting from microBEnet.

We have two new papers out from our lab as part of our microBEnet supported undergraduate genome sequencing project:

Congratulations to all involved especially Jonathan Lo and Zack Bendiks, the undergrads who are first authors, and to David Coil who coordinated all the work.

More information about the project can be found on blog posts from my lab blog (https://phylogenomics.wordpress.com/category/undergraduate-genome-project/) and on a page here on microBEnet (http://www.microbe.net/undergraduate-research-built-environment-genomes/) and the YouTube video below:

 

.

In summary, the point of the project was to (1) start generating some reference genomes for microbes from the built environment and (2) to engage undergraduates at UC Davis in genome sequencing and microbiology of the built environment projects.

The papers are published in a new open access journal from the American Society for Microbiology called “Genome Announcements”.

Thanks also to the Alfred P. Sloan Foundation which funds microBEnet and to the UC Davis Genome Center DNA Technologies Core facility which ran the sequencing.  More papers are coming.  Stay tuned.

One way to keep up with new genome sequence publications – SIGS compilation

This is a very very helpful thing to keep up with new genome sequence releases/publications: Genome sequences published outside of Standards in Genomic Sciences, October-mid November 2012 | Nelson | Standards in Genomic Sciences.  From Oranmlyan Nelson and George Garrity in the SIGS Journal.  It is a bit mind boggling how many genome sequences are being determined and published.  Fun.  But mind boggling.  Anyway – good to have someone trying to keep track.  Also see GOLD: Genomes OnLine Database.

Quick post: nice review on de novo genome assembly

Just a quick post here.  There is a nice review by Monya Baker on de novo genome assembly in Nature Methods: De novo genome assembly: what every biologist should know : Nature Methods : Nature Publishing Group.  It is currently freely available though not sure if that is permanent or not …

Love the start which quotes my colleague Ian Korf

Asked how mature the field of genome assembly is, Ian Korf at the University of California, Davis, compares it to a teenager with great capabilities. “It’s got bold assertions about what it can do, but at the same time it’s making embarrassing mistakes,” he says

The paper is definitely worth a look …

Hey Nature Publishing Group – When are you going to live up to your promises about "free" genome papers? #opengate #aaaaaarrgh

This is just ridiculous.  Nature Publishing Group in 2007 announced that they were making all papers in their journals that reported genome sequences would be made freely available and would be given a Creative Commons license: Shared genomes : Article : Nature.

About a year ago I posted to twitter (using the hashtag #opengate) and my blog about how Nature Publishing Group was not following through on their promises.  See for example

and more including some from others
Amazingly, and pleasantly, I note, in my complaining I exacted some responses from people from Nature Publishing Group who swore that these were just oversights and they would fix them.  Well, alas, the money collecting machine of Nature Publishing Group is back.
For example, currently the following papers are not freely available even though at one point they were or they clearly fit in the “Shared genomes” definition Nature Publishing Group so happily promotes:
These above are all papers of mine, so I noticed them first (I noticed this when trying to create a Pintarest Baord for all my papers and not being able to get to a free page for these papers meant I couldn’t add them to the Board.  Could it be that Nature Publishing Group is just trying to get my goat?  Let’s see.  A brief search found these papers by others – all also not freely available even though all clearly fit Nature’s own definition of genome sequencing papers:
Here are some others

I think the funniest (and scariest) part may be the corrections and errata that are not freely available. And these are just the articles I found in a 15 minute search. I am sure there are more.  Yes, Nature Publishing Group has made many genome papers freely available.  That is great.  Much better than many other publishers.  But the cracks in your system are large and suggest that nobody there is actually dedicated to seeing through on the promises.  Promises are meaningless.  Follow through is the key.  Come on Nature Publishing Group – how about assigning a “Free access ombudsman” or something like that who will make sure that free means free.  I am sick of writing these posts.  You should do your own QC …

UPDATE: see some more recent blog posts of mine about this topic:

UPDATE 3-28-12 1 PM PST:
Well, if you look at the comments, Nature is apparently trying to fix this and most of the articles I listed above are now freely available (the corrections are still not free but they claim to be working on it).  But a simple search of Nature finds there are still some papers that are closed off that shouldn’t be:

It’s not that hard to find these.  It baffles me a bit how people at Nature don’t seem to be able to find them.  But maybe I am just really good at searching …

Important paper on annotation standards for bacterial/archaeal genomes – readying for the "data deluge"

Interesting paper in the journal “Standards in Genomic Sciences” that is worth checking out for anyone interested in genome sequencing and annotation. The paper is “Solving the Problem: Genome Annotation Standards before the Data Deluge” by William (aka Bill) Klimke et al.

It discusses the development of international annotation standards at NCBI (The National Center for Biotechnology Information) in collaboration with others. Note – the paper is Open Access.

Their abstract:

The promise of genome sequencing was that the vast undiscovered country would be mapped out by comparison of the multitude of sequences available and would aid researchers in deciphering the role of each gene in every organism. Researchers recognize that there is a need for high quality data. However, different annotation procedures, numerous databases, and a diminishing percentage of experimentally determined gene functions have resulted in a spectrum of annotation quality. NCBI in collaboration with sequencing centers, archival databases, and researchers, has developed the first international annotation standards, a fundamental step in ensuring that high quality complete prokaryotic genomes are available as gold standard references. Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone. The use of these standards in existing genomes and future submissions will increase the quality of databases, enabling researchers to make accurate biological discoveries.

The paper refers extensively to workshops held by NCBI on genome annotation and gives a link to a page from NCBI with additional information about these workshops.

Now – never mind the extensive use of the term prokaryote in the paper … the paper has got a wealth of information and tidbits worth checking out.

For example the paper has a nice table on annotation tools and databases and resources.

Among the other sections worth checking out
* Discussion of pseudogene annotation and identification
* Discussion of variation in structural annotation
* Evidence standards
* Functional annotation and naming guidelines

For anyone interested in annotating a genome – and more and more people are these days with the decrease in sequencing costs – this is a must read.

New paper from my lab (& the Facciotti lab): Mauve Assembly Metrics #Halophiles #Genomics

Just a quick post here. A new paper from my lab has come out in Bioinformatics. The paper is relatively simple. Titled “Mauve Assembly Metrics” it reports work of Aaron Darling and Andrew Tritt (with some minor contributions from me and Marc Facciotti). Aaron wrote the program Mauve when he was a student in Nicole Perna’s lab at Wisconsin: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Over the years he (and others) have continued to develop the program and written a few papers too including for example, the development of progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. This new paper reports basically a system/scripts to measure assembly quality. Here is the abstract:

High throughput DNA sequencing technologies have spurred the development of numerous novel methods for genome assembly. With few exceptions, these algorithms are heuristic and require one or more parameters to be manually set by the user. One approach to parameter tuning involves assembling data from an organism with an available high quality reference genome, and measuring assembly accuracy using some metrics. We developed a system to measure assembly quality under several scoring metrics, and to compare assembly quality across a variety of assemblers, sequence data types, and parameter choices. When used in conjunction with training data such as a high quality reference genome and sequence reads from the same organism, our program can be used to manually identify an optimal sequencing and assembly strategy for de novo sequencing of related organisms.

Check out the paper: Mauve Assembly Metrics. Download the scripts/code http://ngopt.googlecode.com and Mauve and play around and let me know what you think.
Note this paper was supported by a grant from the National Science Foundation (ER 0949453). That grant is focused on comparative genomics (sequencing and analysis) of halophlic archaea. Stay tuned for more on that project as we are writing up a series of papers ….
Some related links: