Well, I have started previous an “Open Evolution” series here and now I am starting an “Open Metagenomics” series. I know, I have gotten grief from some out there (yes, you Rob Edwards – see comments here) about my support for somewhat non open things in metagenomics, so I am going to try and make up for that as much as possible.
In the first installment, I am pointing people to a new paper on PLoS Genetics “Trends in Selenium Utilization in Marine Microbial World Revealed through the Analysis of the Global Ocean Sampling (GOS) Project” by Yan Zhang, Vadim N. Gladyshev (hat tip to Katie Pollard for pointing out this paper).
In this paper the authors study selenium utilization using data from the first part of the Venter Global Ocean Survey (GOS) which was metagneomic sequencing from multiple samples – mostly surface ocean water. The GOS data they use comes from the Rusch et al. paper in PLoS Biology
(note for full disclosure … I was an c0-author on this paper).
There have been challenges with getting and using metagenomic data from other people’s publications in the past and I note that the authors here obtained the data sets from CAMERA
, a metagenomics database supported by the Moore Foundation. I note – it is my support of this database that Rob Edwards gave me grief about since the database is not currently completely open (e.g., you need to register to use it and the software that runs it is not currently all open source).
Anyway, they got the data from CAMERA, and then did a pretty comprehensive analysis to search for genes and features in the data that would be indicative of selenium utilization. Selenium is of great interest to many biologists for many reasons, including that it is required for the synthesis and function of Selenocysteine (Sec), which , if you do not know, goes occasionally by the nickname “The 21st amino acid”
Without going into all the details of the paper, the last paragraph sums up the major features
In this study, we report a comprehensive analysis of Sec utilization in marine microbial samples of the GOS expedition by characterizing the GOS selenoproteome. This is the first time that the microbial selenoprotein population is described in a global biogeographical context. Our analysis yielded the largest selenoprotein dataset to date, provided a variety of new insights into Sec utilization and revealed environmental factors that influence Sec utilization in the marine microbial world.
My favorite part of the paper is that they map some of the selenium related features onto the globe. For example, in one figure they show the inferred selenoprotein “richness” in
the different samples. (Selenoproteins are proteins that have selenocysteine in them). Now I am sure there are many assumptions they made in leading to the inferences they have made about selenium utilization and I am also sure some of these will turn out to be a bad idea. But to me, this paper is a good example of what researchers will be able to do with metagenomic data in the future. Sitting at their computers anywhere in the world, researchers can now ask questions about the distribution patterns of functions in microbes in the world. Pretty cool. And the more open we are with the papers, the tools, and the data, the more likely this type of work is to spread.
The figures are from the paper and I am permitted to use them here because they were published under a Creative Commons license that allows anyone to use them as long as the source is cited. The source is Zhang Y, Gladyshev VN (2008) Trends in Selenium Utilization in Marine Microbial World Revealed through the Analysis of the Global Ocean Sampling (GOS) Project. PLoS Genet 4(6): e1000095. doi:10.1371/journal.pgen.1000095