Open Metagenomics Highlight – Metagenome Annotation using massively parallel undergrads.

Another fun metagenomics related paper in PLoS Biology. In it Pascal Hingamp et al discuss an Open Source, Open Science system for metagenome annotation (see PLoS Biology – Metagenome Annotation Using a Distributed Grid of Undergraduate Students).

They do this as part of a course on metagenome annotation. And the software for running this is all Open Source and available. They say

“Teachers wishing to use the Annotathon for their courses are invited to create new teams on the public server at (course logistics and team management are detailed in the instructor manual: The underlying open-source software (PHP and MySQL scripts, under a General Public License) is also available for local installation ( In addition, a special “Open Access” team is available for freelance students (volunteer instructors are most welcome to help oversee the Open Access team).”

IN a way this is a metagenomics version of the Undergraduate Genomics Research Initiative (UGRI) which was described in a PLoS Biology paper previously.

Well, this is really the end all be all for me combining so many things I like – genomics, metagenomics, annotation, OA publishing, open source software, etc. Nice job Pascal et al …

Open Metagenomics Highlight: Comparative Analysis of Human Gut Microbiota by Barcoded Pyrosequencing

OK – it is not quite metagenomics, but there is new paper in PLoS One worth looking at if you study uncultured organisms. This paper (Comparative Analysis of Human Gut Microbiota by Barcoded Pyrosequencing) reports on a slightly new twist in carrying out deep rRNA surveys of uncultured microbes using one of the “next” generation sequencing methods.

Open Metagenomics Highlight – PloS Biology paper reporting more from Banfield lab on the Acid Mine Drainage

Just a quick “Open Metagneomics” posting here. There is a very interesting paper in PLoS Biology that just came out reporting more detail from Jill Banfield’s lab on their studies of an Acid Mine Drainage (AMD) site. This paper is ostly a population genomic study of the microbes living in the AMD. See the paper at PLoS Biology – Population Genomic Analysis of Strain Variation in Leptospirillum Group II Bacteria Involved in Acid Mine Drainage Formation.

Open Metagenomics: Selenium in the Oceans

Well, I have started previous an “Open Evolution” series here and now I am starting an “Open Metagenomics” series. I know, I have gotten grief from some out there (yes, you Rob Edwards – see comments here) about my support for somewhat non open things in metagenomics, so I am going to try and make up for that as much as possible.

In the first installment, I am pointing people to a new paper on PLoS Genetics “Trends in Selenium Utilization in Marine Microbial World Revealed through the Analysis of the Global Ocean Sampling (GOS) Project” by Yan Zhang, Vadim N. Gladyshev (hat tip to Katie Pollard for pointing out this paper).

In this paper the authors study selenium utilization using data from the first part of the Venter Global Ocean Survey (GOS) which was metagneomic sequencing from multiple samples – mostly surface ocean water.  The GOS data they use comes from the Rusch et al. paper in PLoS Biology (note for full disclosure … I was an c0-author on this paper).  
There have been challenges with getting and using metagenomic data from other people’s publications in the past and I note that the authors here obtained the data sets from CAMERA, a metagenomics database supported by the Moore Foundation.  I note – it is my support of this database that Rob Edwards gave me grief about since the database is not currently completely open (e.g., you need to register to use it and the software that runs it is not currently all open source). 
Anyway, they got the data from CAMERA, and then did a pretty comprehensive analysis to search for genes and features in the data that would be indicative of selenium utilization. Selenium is of great interest to many biologists for many reasons, including that it is required for the synthesis and function of Selenocysteine (Sec), which , if you do not know, goes occasionally by the nickname “The 21st amino acid”
Without going into all the details of the paper, the last paragraph sums up the major features

In this study, we report a comprehensive analysis of Sec utilization in marine microbial samples of the GOS expedition by characterizing the GOS selenoproteome. This is the first time that the microbial selenoprotein population is described in a global biogeographical context. Our analysis yielded the largest selenoprotein dataset to date, provided a variety of new insights into Sec utilization and revealed environmental factors that influence Sec utilization in the marine microbial world.

My favorite part of the paper is that they map some of the selenium related features onto the globe.  For example, in one figure they show the inferred selenoprotein “richness” in 
the different samples. (Selenoproteins are proteins that have selenocysteine in them).  Now I am sure there are many assumptions they made in leading to the inferences they have made about selenium utilization and I am also sure some of these will turn out to be a bad idea.  But to me, this paper is a good example of what researchers will be able to do with metagenomic data in the future.  Sitting at their computers anywhere in the world, researchers can now ask questions about the distribution patterns of functions in microbes in the world.  Pretty cool.  And the more open we are with the papers, the tools, and the data, the more likely this type of work is to spread.
The figures are from the paper and I am permitted to use them here because they were published under a Creative Commons license that allows anyone to use them as long as the source is cited. The source is Zhang Y, Gladyshev VN (2008) Trends in Selenium Utilization in Marine Microbial World Revealed through the Analysis of the Global Ocean Sampling (GOS) Project. PLoS Genet 4(6): e1000095. doi:10.1371/journal.pgen.1000095