California Metagenomics Meeting presentations available online

For those interested in metagenomics, the presentations from a metagenomics meetings that was held at the Moore Foundation HQ are now available online.

I wrote about the meeting previously in my blog.

The speakers were Eric Allen (UCSD-SIO), Doug Bartlett (UCSD-SIO), Weizhong Li (UCSD), Victor Markowitz (JGI), Jonathan Eisen (UCD), Victoria Orphan (Caltech), Adam Martiny (UCI), Jessica Green (UCM), Kimmen Sjolander (UCB), and Steven Brenner (UCB). All Powerpoint presentations are available in PDF format.

More on the Human Microbiome Program Workshop – Day1

As a follow up to my previous blog I am posting some additional information here about the NIH Roadmap Human Microbiome Project Workshop, which was held in Bethesda, MD.

The general outline of the meeting was as follows:

  • Sunday Night
    • Introduction
      • Welcome by Francis Collins (NHGRI), Hugh Auchincloss (NIAID) and Griffin Rodgers (NIDDK)
      • Comments by Gary Schoolnik
      • Overview of the NAS report on metagenomics by Jim Tiedje
      • Overview of the NIH Roadmap program by Francis Collins
    • Introductory talks on human microbiome
      • Jeff Gordon
      • David Relman
      • Gary Huffnagle
      • Jo Handelsman
  • Monday AM
    • Technological issues
      • Elaine Mardis
      • Jill Banfield
      • Deirdre Meldrum
    • Bioinformatics issues
      • Lior Pachter
      • Rolf Apweiler
      • Peer Bork
    • ELSI Issues Pilar Ossorio
  • Lunch
  • Monday PM – Breakout sessions and discussion
    • Group 1 – Reference microbiome (Claire Fraser and Martin Blaser)
    • Group 2 – Changes in microbiome and human health (Rita Colwell and Martin Rosenberg)
    • Group 3 – Enabling technologies (Bruce Birren and Mary Lidstrom)
    • Group 4 – Bioinformatics tools (Ewan Birney and Owen White)
    • Group 5 – Ethical legal and social issues (Midred Cho)
  • Wrap up

Overall, I found the Sunday night talks very useful to set the stage. The introductory talks by the representatives from NHGRI, NIAID, and NIDDK clearly indicated that NIH as well as others consider the human microbiome an incredibly important research area. Then Jim Tiedje gave a nice overview of the recent NAS report on metagenomics (which was about metagenomics in general, not specifically for the human microbiome). The main points of the report are basically: microbes rule the world, metagenomics is a very powerful tool in studying them, and there is a need for a more coordinated effort among funding agencies to push metagenomics as a tool and a field. (My only complaint about Tiedje’s presentation was he kept using the term “higher organisms” for those multicellular species with nuclei. But otherwise, he did a good job of concicely summarizing the report and the benefits as well as challenges of metagenomics).

Francis Collins then gave an overview of the NIH Roadmap Program. The Roadmap was started in ~2003 as an initaitive to identify projects that would need coordination across multiple NIH agencies. These projects should meet certain characteristics: truly transforming, require all NIH, must need incubator scape, and the outcome should produce material into the public domain. Collins then discussed how, from among hundreds of suggestions, the Human Microbiome was picked as one of five topic areas for in depth consideration for the new round of Roadmap competition. Thus the point of this workshop was to discuss this in more detail and help provide material and ideas for the full consideration of an HMP program.

I should note, I found one thing disappointing in the introduction which was a response to my question concerning whether this project would be limited only to studies of humans or would allow for studies of model systems that inform human work. The answer was basically that this would likely be limited to humans. I think this is a big mistake. The human genome project came to the realization that comparative studies with other species were critical to understanding and interpreting studies of the human genome. The same will be true of the human microbiome program.

Jeff Gordon then gave an overview of human microbiome studies, and focused on what are the key questions that need to be answered. Among the key questions: Do we share a core set of microbes? How should we view differences in microbes between people and over time? How do we relate communities of microbes to health and disease? How should we sample microbial communities to characterize them? What determines robustness of microbial communities in people?

To start to answer these and other questions, he suggested that we have three tiers of data collection: (1) deep draft assemblies of microbial communities and reference genomes, (2) reference microbiome work (deep characterization of individuals including information about the familiy history and genetics) (3) 16s surveys of communities (a global human microbial diversity survey). I basically liked all of his ideas. He did talk about work in model organisms too. His work has shown just how important this is … and I think as I said above it needs to be emphasized more in the HMP.

David Relman, from Stanford, then talked about patterns in human microbial diversity. He talked about some of the challenges in such studies as well as results of his and others work. He discussed many interesting aspects of the diversity of samples, and the shapes of diversity. Some of the patterns he emphasized were that history plays a role in the diversity, that archaea generally seem to have limited presence, that diversity is uneven and complex.

Then Gary Huffnagle discussed in more detail the interaction of microbes with the host immune system. And Jo Handelsman discussed what she calls functional metagenomics, which involves focusing on the functions of genes found in the environment on top of examining the phylogenetic diversity of communities. Unfortauntely, I did not take extensive notes for these two talks so do not have much to base my comments on here. In addition, I confess, the fact that the room in which the meeting was held was incredibly crowded and boiling hot, and the fact that I had flown in from California earlier in the day, made taking notes challenging at this point. However, that did not stop me from going out afterwards for a beer with Julian Parkhill, Ewan Birney, Owen White, and Jacques Ravel. The worst part of going out for the beer – I grew up in Bethesda but I made multiple wrong turns in the two blocks to the brew pub. I am sure from now on Julian and Ewan will never trust my directions. Fortunately, the fact that the pub had the RedSox pummeling the Yankees on TV made up for my direction problems.

I will post more about the second day soon.

Jonathan Badger on another reason not to publish in non Open Access journals

Jonathan Badger on his blog has a good little blurb about how he cannot examine a recent piece of research because his institution does not have a subscription to the journal. So the authors lose a reader becuase they published in a non OA journal. Worth a little read for microbiologists since many may not have access to the journal Geology where the paper was published. But the paper was about a fossil that may have been of a fungus.

A human microbiome program?

I am currently attending a workshop sponsored by NIH in which the participants are discussing whether there should be a Human Microbiome Project, and if so, what that should mean.

First, what is generally meant by the “Microbiome.” In essence the humn microbiome is the sum collection of all the microbes found in or on people. The human microbiome has become an important research field because the microbes that live in and among us play critical roles in human disease and health. An important aspect of this is the idea that microbes can be and are beneficial. For example, in the gut the normal microbes help with digestion and nutrient absorption as well as protect from infection. In addition, a variety of diseases (e.g., IBD, Krohns) seem likely to be caused by disruption in the normal microbial flora. In general, it seems likely that other ailments, like autoimmune diseases, allergies, etc will be found to have a connection to disruptions in the beneficial microbes that live among us.

Because of the importance of beneficial / commensal microbes in human biology, there have been growing efforts to characterize the microbes in various body locations – gut, mouth, lungs, skin, etc. But the efforts so far have simply given a tantalizing taste of how interesting and important these microbes are. So here comes this meeting. Organized by NIH (specifically, Francis Collins at NHGRI), this workshop is geared to discuss the possibility that studies of the human microbiome will be included in the next list of “NIH Roadmap” programs. More on the NIH Roadmap some other time.

Basically, the general idea is – do we need an big scale, organized program to tackle the human microbiome.? To get us in the mood, we had talks by many of the pioneers/leaders in the field (e.g., David Relman, Jeff Gordon, Jim Tiedje) as well as discussion of the NIH Roadmap program. I personally did not need any convincing but it was good to hear some of the ideas presented. In the end, I think there is no doubt that a large scale Human Microbiome Program is needed and would be very beneficial.

One of the reasons that an organized effort is needed is that studies of the human micribome are difficult. Reasons for this include:

1. Many of the microbes in the human system have not, and maybe cannot, be grown in isolation in the lab

2. The key features of the microbiome are determined by by populations of microbes and thus even if a representative of a species could be grown in the lab, it would not represent all the diversity in the population.

3. The best way to sample the populations is via “metagenomic” sequencing which involves isolating DNA and sequencing it directly without culturing.

4. Many of the important sites contain hundreds of species each with significant variation within species.

5. There likely will be ENORMOUS variation in and among people. Within a person, there will be variation over time as well as great variation in different sites. On top of that there will be great variation between people.

Given these and other complications, it seems a no brainer there is a need for a coordinated project to gather background information about the human microbiome that would then be useful to researchers, much like the human genome was useful to many researchers. So what would such a project do? Here are some possibilities

1. Sequence many “reference genomes.” By reference genomes here I mean genomes of cultured isolates that are closely related to organisms known in various human locations.

2. Do metagenomic sequencing of a variety of human mcirobiome samples.

3. Conduct large scale human microbiome diversity studies. This could involve rRNA PCR surveys as well as some amount of genome sequencing.

4. Develop the computational tools needed to analyze the massive amounts of data that will come out.

5. Encourage the development of new methods to aid in studies of the microbiome.

So today I guess we will be discussing what specific things are needed in more detail. But again, even though I do not really work on human microbiome projects much, I think it is pretty clear that the time is right for a Human Microbiome Program. And importantly, the methods and tools and discoveries that could come from this will be of use in all studies of microbes in the environment.

That’s all I have for now … will try to write more later.

Metagenomics, a visit to the Moore Foundation HQ, and things not to ask for from your Program Officer

OK – so here is my confession. I posted my blog last week about my brother’s birthday to make up for my ditching him on his 40th birthday to go to a meeting at the Moore Foundation HQ. I really needed to go to this meeting, for many reasons, and so I went just for the day (driving from Davis did not take anywhere near as long as I expected — just ~ 1.5 hours to the Presidio in SF where the Moore HQ is located). But with two kids including a newborn, I could not go to the A’s Yankees game with my brother and so I posted that birthday blog to do something.

Anyway – not a ton to report from the meeting. It was the second California Metagenomics workshop organized by UCSD as part of the CAMERA project. The last one was in Berkeley. There were some good talks but as usual the best thing was a chance to talk to people in person. The two best talks in my opinion were one by Victoria Orphan from Caltech and one by Jessica Green from UC Merced. Orphan talked about a special sorting method they are using to pull out cells of particular organisms from environmental samples for subsequent gene and genome sequencing. Green talked about her work on spatial ecology and biogeography of microbes. I think we desperately need more people like Green in the microbial ecology field — people who are taking methods and concepts used for “big” organisms and applying them to the microbial world (another example is Jen Martiny at Irvine who was not at the meeting but her husband Adam Martiny was there).

The most painful part of the meeting (other than the traffic on the way home) — the lame behavior of many of the scientists in regard to our hosts the Moore Foundation. Despite being told many times that people were expected to bus their own tables — few did. And even better, one of the participants (Adam Godzik) spent serious effort complaining about the coffee not being strong enough to the program officer from the Moore Foundation. And then asking her to get him some stronger coffee. Clearly he likes his coffee. But not really the best way to interact with a program officer.

Scientist Reveals Secret of the Ocean: It’s Him

Published: April 1, 2007

Maverick scientist J. Craig Venter has done it again. It was just a few years ago that Dr. Venter announced that the human genome sequenced by Celera Genomics was in fact, mostly his own. And now, Venter has revealed a second twist in his genomic self-examination. Venter was discussing his Global Ocean Voyage, in which he used his personal yacht to collect ocean water samples from around the world. He then used large filtration units to collect microbes from the water samples which were then brought back to his high tech lab in Rockville, MD where he used the same methods that were used to sequence the human genome to study the genomes of the 1000s of ocean dwelling microbes found in each sample. In discussing the sampling methods, Venter let slip his latest attack on the standards of science – some of the samples were in fact not from the ocean, but were from microbial habitats in and on his body.

“The human microbiome is the next frontier,” Dr. Venter said. “The ocean voyage was just a cover. My main goal has always been to work on the microbes that live in and on people. And now that my genome is nearly complete, why not use myself as the model for human microbiome studies as well. ”

It is certainly true that in the last few years, the microbes that live in and on people have become a hot research topic. So hot that the same people who were involved in the race to sequence the human genome have been involved in this race too. Francis Collins, Venter main competitor and still the director of the National Human Genome Research Institute (NHGRI), recently testified before Congress regarding this type of work. He said, “There are more bacteria in the human gut than human cells in the entire human body… The human microbiome project represents an exciting new research area for NHGRI.” Other minor players in the public’s human genome effort, such as Eric Lander at the Whitehead Institute and George Weinstock at Baylor College of Medicine are also trying to muscle their way into studies of the human microbiome.

But Venter was not going to have any of this. “This time, I was not going to let them know I was coming. There would be no artificially declared tie. We set up a cutting edge human microbiome sampling system on the yacht, and then headed out to sea. They never knew what hit them. Now I have finished my microbiome.”

Reactions among scientists range from amusement to indifference, most saying that it is unimportant whose microbiome was sequenced. But a few scientists expressed disappointment that Dr. Venter had once again subverted the normal system of anonymity. Recent human microbome studies by other researchers have all involved anonymous donors. Jeff Gordon, at the Washington University in St. Louis expressed astonishment, “I have to fill out about 200 forms for every sample. It takes years to get anything done. And now Venter sails away with the prize. All I can say is, I will never listen to one of my review boards again.”

Venter had hinted at the possibility that something was amiss in an interview he gave last week for the BBC News. He said “Most of the samples we studied were from the ocean but a few were from people.” When the interviewer seemed stunned, Doug Rusch, one of Venter’s collaborators stepped in and said “Collected with the help of other people.”

Venter was apparently spurred to make the admission today that many of the samples were in fact from his own microbiome due to a video that surfaced on YouTube showing Jeff Hoffman, the person responsible for collecting the water samples, performing a tooth scraping of Venter and then replacing the ocean water filter with Venter’s tooth sample.

Venter said the YouTube video was immaterial, “Well, we wanted to wait a few more weeks to have the papers describing the human microbiome published. But in the interest of human health we are deciding to make the announcement today.”

Unlike with the human genome data however, Venter says all of the data from his personal microbiome will be made publicly available with no restrictions. “If there is one lesson I have learned it is that open access is better than closed access. The more people can access my microbiome, the more they will help me understand myself. Plus, unlike Collins and Lander, who publish only in fee-for access journals, we will be publishing our analysis in the inaugural issue of a new Open Access journal that is a joint effort between the Public Library of Science and Nature. It will be called PLoN, the Public Library of Nature.”

In making his microbiome available, Venter has yet again abandoned his genetic privacy as he did when making his own genome available. Interestingly, the microbiome helps explain one of the first findings that was announced regarding his own genome. Venter said that analysis of the samples that came from his intestine reveal that microbes may explain why even though he has an apoE4 allele in his own genome (which is associated with abnormal fat metabolism) he does not need to take fat-lowering drugs. “Apparently, I have some really good fat digesters living in my gut. They make up for what is missing in my own genome.”

Dr. Venter’s reason for having his own microbiome sequenced, he said in the interview was in part scientific curiosity — ”How could one not want to know about one’s own microbes?” As to opening himself to the accusation of egocentricity, he said, ”I’ve been accused of that so many times, I’ve gotten over it.”

The key question that remains is – which of the samples were really from the ocean and which are from Venter. Venter said “Our funding agencies, including the DOE and the Moore Foundation, have agreed that we should not explicitly reveal which samples are which as this will encourage people to develop better methods of analyzing such complex mixtures of different microbes. Next week we will be announcing an X-prize award for the person who can identify which samples are mine and where they came from in me.”

Rob Edwards, a freelance microbial genomics expert says “It won’t be difficult to tell which are which. In fact, we had already identified an anomalous sample from Venter’s previous ocean sampling work, but nobody would listen to us.”

Jonathan Eisen, an evolutionary biologist who used to work for Venter says “I am certain that a few creative evolutionary analyses can reveal which sample is which. In fact, we are starting analyzing the samples already in anticipation of the X-prize announcement.”

Others are not so confident. Ed Delong, an ocean microbiology expert from MIT says “We have spent years carefully selecting our ocean samples to make sure they are not contaminated with sewage from cruise ships or from city drains. And now this – a purposeful mixture of ocean and human. It could take years to clean up the mess.”

Venter does not seem concerned. “If nobody can figure out which sample is from me and which is from the ocean, then we have no hope of making any progress in studies of either human microbiomes or oceans.”

More importantly, many scientists want to know what Venter will do next. Some want to know so that they can make sure to stay out of the way. Others probably relish the potential to go head to head with Venter. In this regard, Venter is not shy. “Biofuels. There is a great future in biofuels.”

Metagenomics makes it to Slashdot

Metagenomics has finally hit the big time. A story on metagenomics is on Slashdot.

Metagenomics studies begin by extracting DNA from all the microbes living in a particular environmental sample; there could be thousands or even millions of organisms in one sample. The extracted genetic material consists of millions of random fragments of DNA that can be cloned into a form capable of being maintained in laboratory bacteria. These bacteria are used to create a “library” that includes the genomes of all the microbes found in a habitat, the natural environment of the organisms. Although the genomes are fragmented, new DNA sequencing technology and more powerful computers are allowing scientists to begin making sense of these metagenomic jigsaw puzzles. They can examine gene sequences from thousands of previously unknown microorganisms, or induce the bacteria to express proteins that are screened for capabilities such as vitamin production or antibiotic resistance.

There is some bizarre stuff in there but hey that is OK. This is linking to a story in Science Daily about metagenomics which in itself is based on a National Academy Report on the field. The NAS report is definitely worth looking at.

The people who ran the committee are a who’s who of the field including the chairs, Jo Handelsman (who coined the term metagenomics) and Jim Tiedje who is one of the grand gurus of environmental microbiology.

Global Ocean Survey to be on PBS Newshour with Jim Lehrer

Apparently they are running a story on the Venter Global Ocean Survey project on the NewsHour tonight

Not sure exactly what they are saying but good that it has made it to my favorite news show.

Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes. PLoS Biol 5(3): e82

I am posting here my recent paper that just came out in PLoS Biology on Environmental Shotgun Sequencing. With PLoS’s Creative Commons license I am allowed to do this, which makes me happy. The citation is Eisen JA (2007) Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes. PLoS Biol 5(3): e82 doi:10.1371/journal.pbio.0050082

Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes

Jonathan A. Eisen

Citation: Eisen JA (2007) Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes. PLoS Biol 5(3): e82 doi:10.1371/journal.pbio.0050082

Published: March 13, 2007

Copyright: © 2007 Jonathan A. Eisen. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abbreviations: ESS, environmental shotgun sequencing; PCR, polymerase chain reaction; rRNA, ribosomal RNA

Jonathan A. Eisen is at the University of California Davis Genome Center, with joint appointments in the Section of Evolution and Ecology and the Department of Medical Microbiology and Immunology, Davis, California, United States of America. Web site: http://phylogenomics.blogspot.com. E-mail:jaeisen@ucdavis.edu

Series Editor: Simon Levin, Princeton University, United States of America

This article is part of the Oceanic Metagenomics collection in PLoS Biology. The full collection is available online at http://collections.plos.org/plosbiology/gos-2007.php.


Since their discovery in the 1670s by Anton van Leeuwenhoek, an incredible amount has been learned about microorganisms and their importance to human health, agriculture, industry, ecosystem functioning, global biogeochemical cycles, and the origin and evolution of life. Nevertheless, it is what is not known that is most astonishing. For example, though there are certainly at least 10 million species of bacteria, only a few thousand have been formally described [1]. This contrasts with the more than 350,000 described species of beetles [2]. This is one of many examples indicative of the general difficulties encountered in studying organisms that we cannot readily see or collect in large samples for future analyses. It is thus not surprising that most major advances in microbiology can be traced to methodological advances rather than scientific discoveries per se.

Examples of these key revolutionary methods (Table 1) include the use of microscopes to view microbial cells, the growth of single types of organisms in the lab in isolation from other types (culturing), the comparison of ribosomal RNA (rRNA) genes to construct the first tree of life that included microbes [3], the use of the polymerase chain reaction (PCR) [4] to clone rRNA genes from organisms without culturing them [5–7], and the use of high-throughput “shotgun” methods to sequence the genomes of cultured species [8]. We are now in the midst of another such revolution—this one driven by the use of genome sequencing methods to study microbes directly in their natural habitats, an approach known as metagenomics, environmental genomics, or community genomics [9].

Table 1.

Some Major Methods for Studying Individual Microbes Found in the Environment

In this essay I focus on one particularly promising area of metagenomics—the use of shotgun genome methods to sequence random fragments of DNA from microbes in an environmental sample. The randomness and breadth of this environmental shotgun sequencing (ESS)—first used only a few years ago [10,11] and now being used to assay every microbial system imaginable from the human gut [12] to waste water sludge [13]—has the potential to reveal novel and fundamental insights into the hidden world of microbes and their impact on our world. However, the complexity of analysis required to realize this potential poses unique interdisciplinary challenges, challenges that make the approach both fascinating and frustrating in equal measure.

Who Is Out There? Typing and Counting Microbes in the Environment

One of the most important and conceptually straightforward steps in studying any ecosystem involves cataloging the types of organisms and the numbers of each type. For a long time, such typing and counting was an almost insurmountable problem in microbiology. This is largely because physical appearance does not provide a valid taxonomic picture in microbes. Appearance evolves so rapidly that two closely related taxa may look wildly different and two distantly related taxa may look the same. This vexing problem was partially overcome in the 1980s through the use of rRNA-PCR (Table 1). This method allows microorganisms in a sample to be phylogenetically typed and counted based on the sequence of their rRNA genes, genes that are present in all cell-based organisms. In essence, a database of rRNA sequences [14,15] from known organisms functions like a bird field guide, and finding a rRNA-PCR product is akin to seeing a bird through binoculars. Rather than counting species, this approach focuses on “phylotypes,” which are defined as organisms whose rRNA sequences are very similar to each other (a cutoff of >97% or >99% identical is frequently used). The ability to use phylotyping to determine who was out there in any microbial sample has revolutionized environmental microbiology [16], led to many discoveries [e.g., 17], and convinced many people (myself included) to become microbiologists.

The selective targeting of a single gene makes rRNA-PCR an efficient method for deep community sampling [18]. However, this efficiency comes with limitations, most of which are complemented or circumvented by the randomness and breadth of ESS. For example, examination of the random samples of rRNA sequences obtained through ESS has already led to the discovery of new taxa—taxa that were completely missed by PCR because of its inability to sample all taxa equally well (e.g., [19]). In addition, ESS provides the first robust sampling of genes other than rRNA, and many of these genes can be more useful for some aspects of typing and counting. Some universal protein coding genes are better than rRNA both for distinguishing closely related strains (because of third position variation in codons) and for estimating numbers of individuals (because they vary less in copy number between species than do rRNA genes) [10]. Perhaps most significantly, ESS is providing groundbreaking insights into the diversity of viruses [20,21], which lack rRNA genes and thus were left out of the previous revolution.

Certainly, many challenges remain before we can fully realize the potential of ESS for the typing and counting of species, including making automated yet accurate phylogenetic trees of every gene, determining which genes are most useful for which taxa, combining data from different genes even when we do not know if they come from the same organisms, building up databases of genes other than rRNA, and making up for the lack of depth of sampling. If these challenges are met, ESS has the potential to rewrite much of what we thought we knew about the phylogenetic diversity of microbial life.

What Are They Doing? Top Down and Bottom Up Approaches to Understanding Functions in Communities

A community is, of course, more than a list of types of organisms. One approach to understanding the properties and functioning of a microbial community is to start with studies of the different types of organisms and build up from these individuals to the community. Ideally, to do this one would culture each of the phylotypes and study its properties in the lab. Unfortunately, many, if not most, key microbes have not yet been cultured [22]. Thus, for many years, the only alternative was to make predictions about the biology of particular phylotypes based on what was known about related organisms. Unfortunately, this too does not work well for microbes since very closely related organisms frequently have major biological differences. For example, Escherichia coli K12 and E. coli O157:H7 are strains of the same species (and considered to be the same phylotype), with genomes containing only about 4,000 genes, yet each possesses hundreds of functionally important genes not seen in the other strain [23]. Such differences are routine in microbes, and thus one cannot make any useful inferences about what particular phylotypes are doing (e.g., type of metabolism, growth properties, role in nutrient cycling, or pathogenicity) based on the activities of their relatives.

These difficulties—the inability to culture most microbes and the functional disparities between close relatives—led to one of the first kinds of metagenomic analyses, wherein predictions of function were made from analysis of the sequence of large DNA fragments from representatives of known phylotypes. This approach has provided some stunning insights, such as the discovery of a novel form of phototrophy in the oceans [24]. However, this large insert approach has the same limitation as predicting properties from characterized relatives—a single cell cannot possibly represent the biological functions of all members of a phylotype.

ESS provides an alternative, more global way of assessing biological functions in microbial communities. As when using the large insert approach, functions can be predicted from sequences. However, in this case the predicted functions represent a random sampling of those encoded in the genomes of all the organisms present. This approach has unquestionably been wildly successful in terms of gene discovery. For example, analysis of ESS data has revealed novel forms of every type of gene family examined, as well as a great number of completely novel families (e.g., [25]). However, there is a major caveat when using ESS data to make community-level inferences. Ecosystems are more than just a bag of genes—they are made up of compartments (e.g., cells, chromosomes, and species), and these compartments matter. The key challenge in analyzing ESS data is to sort the DNA fragments (which are usually less than 1,000 base pairs long relative to genome sizes of millions or billions of bases) into bins that correspond to compartments in the system being studied.

A recent study by myself and colleagues illustrates the importance of compartments when interpreting ESS data. When we analyzed ESS data from symbionts living inside the gut of the glassy-winged sharpshooter (an insect that has a nutrient-limited diet), we were able to bin the data to two distinct symbionts [26]. We then could infer from those data that one of the symbionts synthesizes amino acids for the host while the other synthesizes the needed vitamins and cofactors. Modeling and understanding of this ecosystem are greatly enhanced by the demonstration of this complementary division of labor, in comparison to simply knowing that amino acids, vitamins, and cofactors are made by “symbionts.”

How does one go about binning ESS data? A variety of approaches have been developed, some of which are described in Table 2. In considering the different binning methods and their limitations, the first question one needs to ask is, what are we trying to bin? Is it fragments from the same chromosome from a single cell, which would be useful for studying chromosome structure? If so, then perhaps genome assembly methods are the best. What if instead, as in the sharpshooter example, we are trying to have each bin include every fragment that came from a particular species, knowledge which may be useful for predicting community metabolic potential? If the level of genetic polymorphism among individual cells from the same species is high, then genome assembly methods may not work well (the polymorphisms will break up assemblies). A better approach might be to look for species-specific “word” frequencies in the DNA, such as ones created by patterns in codon usage. The challenge is, how do we tune the methods to find the right target level of resolution? If we are too stringent, most bins will include only a few fragments. But if we are too relaxed, we will create artificial constructs that may prove biologically misleading, such as grouping together sequences from different species. To make matters more complex, most likely the stringency needed will vary for different taxa present in the sample.

Table 2.

Methods of Binning

Another critical issue is the diversity of the system under study. Generally, binning works better when there are few different phylotypes present, all of which are distantly related and form discrete populations. This is why binning works well for the sharpshooter system and other relatively isolated, low diversity environments. Binning increases in difficulty exponentially as the number of species increases: the populations and species start to merge together, and the populations get more and more polymorphic and variable in relative abundance (such as in the paper about the Global Ocean Sampling expedition in this issue [27]). Further complicating binning is the phenomenon of lateral gene transfer, where genes are exchanged between distantly related lineages at rates that are high enough that random sampling of a genome will frequently include genes with multiple histories.

Despite these challenges, I believe we can develop effective binning methods for complex communities. First, we can combine different approaches together, such as using one method to sort in a relaxed manner and then using another to subdivide the bins provided by the first method. Second, we can incorporate new approaches such as population genetics into the analysis [28]. In addition, the lessons learned here can be applied to other aspects of metagenomics (e.g., the counting and typing discussed above) and provide insights into the nature of microbial genomes and the structure of microbial populations and communities.

Comparative Metagenomics

So far, I have discussed issues relating mostly to intrasample analysis of ESS data. However, the area with perhaps the most promise involves the comparative analysis of different samples. This work parallels the comparative analysis of genomes of cultured species. Initial studies of that type compared distantly related taxa with enormous biological differences. What has been learned from these studies pertains mostly to core housekeeping functions, such as translation and DNA metabolism, and to other very ancient processes [29,30]. It was not until comparisons were made between closely related organisms that we began to understand events that occurred on shorter time scales, such as selection, gene transfer, and mutation processes [31]. Similarly, the initial comparisons of ESS data involved comparisons of wildly different environments [32], yielding insights into the general structure of communities. But as more comparisons are made between similar communities [33,34], such as those sampled during vertical and horizontal ocean transects [27,35–37], we will begin to learn about shorter time scale processes such as migration, speciation, extinction, responses to disturbance, and succession. It is from a combination of both approaches—comparing both similar and very divergent communities—that we will be able to understand the fundamental rules of microbial ecology and how they relate to ecological principles seen in macro-organisms.

Conclusions

In promoting some of the exciting opportunities with ESS, I do not want to give the impression that it is flawless. It is helpful in this respect to compare ESS to the Internet. As with the Internet, ESS is a global portal for looking at what occurs in a previously hidden world. Making sense of it requires one to sort through massive, random, fragmented collections of bits of information. Such searches need to be done with caution because any time you analyze such a large amount of data patterns can be found. In addition, as with the Internet, there is certainly some hype associated with ESS that gives relatively trivial findings more attention than they deserve. Overall, though, I believe the hype is deserved. As long as we treat ESS as a strong complement to existing methods, and we build the tools and databases necessary for people to use the information, it will live up to its revolutionary potential.

Acknowledgments

I thank Simon Levin, Joshua Weitz, Jonathan Dushoff, Maria-Inés Benito, Doug Rusch, Aaron Halpern, and Shibu Yooseph for helpful discussions, and Melinda Simmons, Merry Youle, and three anonymous reviewers for helpful comments on the manuscript. The writing of this paper was supported by National Science Foundation Assembling the Tree of Life Grant 0228651 to Jonathan A. Eisen and by the Defense Advanced Research Projects Agency under grants HR0011-05-1-0057 and FA9550-06-1-0478.

References

  1. Gould SJ (1996) Full house: The spread of excellence from Plato to Darwin New York: Harmony Books. 244–p p.
  2. Evans AV, Bellamy CL (1996) An inordinate fondness for beetles New York: Holt. 208–p p.
  3. Woese C, Fox G (1977) Phylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc Natl Acad Sci U S A 74: 5088–5090. Find this article online
  4. Mullis K, Faloona F (1987) Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzymol 155: 335–350. Find this article online
  5. Reysenbach AL, Giver LJ, Wickham GS, Pace NR (1992) Differential amplification of rRNA genes by polymerase chain reaction. Appl Environ Microbiol 58: 3417–3418. Find this article online
  6. Medlin L, Elwood HJ, Stickel S, Sogin ML (1988) The characterization of enzymatically amplified eukaryotic 16S-like ribosomal RNA-coding regions. Gene 71: 491–500. Find this article online
  7. Weisburg W, Barns S, Pelletier D, Lane D (1991) 16S ribosomal DNA amplification for phylogenetic study. J Bacteriol 173: 697–703. Find this article online
  8. Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496–512. Find this article online
  9. Handelsman J (2004) Metagenomics: Application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev 68: 669–685. Find this article online
  10. Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, et al. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304: 66–74. Find this article online
  11. Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, et al. (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428: 37–43. Find this article online
  12. Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, et al. (2006) Metagenomic analysis of the human distal gut microbiome. Science 312: 1355–1359. Find this article online
  13. Garcia Martin H, Ivanova N, Kunin V, Warnecke F, Barry KW, et al. (2006) Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nat Biotechnol 24: 1263–1269. Find this article online
  14. Olsen GJ, Larsen N, Woese CR (1991) The ribosomal RNA database project. Nucleic Acids Res 19: 2017–2021. Find this article online
  15. Cole JR, Chai B, Farris RJ, Wang Q, Kulam-Syed-Mohideen AS, et al. (2007) The ribosomal database project (RDP-II): Introducing myRDP space and quality controlled public data. Nucleic Acids Res 35: D169–D172. Find this article online
  16. Pace NR (1997) A molecular view of microbial diversity and the biosphere. Science 276: 734–740. Find this article online
  17. Hugenholtz P, Pitulle C, Hershberger KL, Pace NR (1998) Novel division level bacterial diversity in a Yellowstone hot spring. J Bacteriol 180: 366–376. Find this article online
  18. Sogin ML, Morrison HG, Huber JA, Welch DM, Huse SM, et al. (2006) Microbial diversity in the deep sea and the underexplored “rare biosphere” Proc Natl Acad Sci U S A 103: 12115–12120. Find this article online
  19. Baker BJ, Tyson GW, Webb RI, Flanagan J, Hugenholtz P, et al. (2006) Lineages of acidophilic archaea revealed by community genomic analysis. Science 314: 1933–1935. Find this article online
  20. Angly FE, Felts B, Breitbart M, Salamon P, Edwards RA, et al. (2006) The marine viromes of four oceanic regions. PLoS Biol 4: e368 doi:10.1371/journal.pbio.0040368. Find this article online
  21. Edwards RA, Rohwer F (2005) Viral metagenomics. Nat Rev Microbiol 3: 504–510. Find this article online
  22. Leadbetter JR (2003) Cultivation of recalcitrant microbes: Cells are alive, well and revealing their secrets in the 21st century laboratory. Curr Opin Microbiol 6: 274–281. Find this article online
  23. Perna NT, Plunkett G 3rd, Burland V, Mau B, Glasner JD, et al. (2001) Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409: 529–533. Find this article online
  24. Beja O, Aravind L, Koonin EV, Suzuki MT, Hadd A, et al. (2000) Bacterial rhodopsin: Evidence for a new type of phototrophy in the sea. Science 289: 1902–1906. Find this article online
  25. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, et al. (2007) The Sorcerer II Global Ocean Sampling expedition: Expanding the universe of protein families. PLoS Biol 5: e16 DOI: 10.1371/journal.pbio.0050016. Find this article online
  26. Wu D, Daugherty SC, Van Aken SE, Pai GH, Watkins KL, et al. (2006) Metabolic complementarity and genomics of the dual bacterial symbiosis of sharpshooters. PLoS Biol 4: e188 doi:10.1371/journal.pbio.0040188. Find this article online
  27. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, et al. (2007) The Sorcerer II Gobal Ocean Sampling expedition: Northwest Atlantic through Eastern Tropical Pacific. PLoS Biol 5: e77 doi:10.1371/journal.pbio.0050077. Find this article online
  28. Johnson PL, Slatkin M (2006) Inference of population genetic parameters in metagenomics: A clean look at messy data. Genome Res 16: 1320–1327. Find this article online
  29. Koonin EV, Mushegian AR (1996) Complete genome sequences of cellular life forms: Glimpses of theoretical evolutionary genomics. Curr Opin Genet Dev 6: 757–762. Find this article online
  30. Mushegian AR, Koonin EV (1996) A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci U S A 93: 10268–10273. Find this article online
  31. Eisen JA (2001) Gastrogenomics. Nature 409: 463–465 465–466. Find this article online
  32. Tringe SG, von Mering C, Kobayashi A, Salamov AA, Chen K, et al. (2005) Comparative metagenomics of microbial communities. Science 308: 554–557. Find this article online
  33. Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M, et al. (2006) Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics 7: 57. Find this article online
  34. Rodriguez-Brito B, Rohwer F, Edwards RA (2006) An application of statistics to comparative metagenomics. BMC Bioinformatics 7: 162. Find this article online
  35. DeLong EF (2005) Microbial community genomics in the ocean. Nat Rev Microbiol 3: 459–469. Find this article online
  36. DeLong EF, Preston CM, Mincer T, Rich V, Hallam SJ, et al. (2006) Community genomics among stratified microbial assemblages in the ocean’s interior. Science 311: 496–503. Find this article online
  37. Worden AZ, Cuvelier ML, Bartlett DH (2006) In-depth analyses of marine microbial community genomics. Trends Microbiol 14: 331–336. Find this article online

Venter Global Ocean Voyage Press Conference

For those interested in metagenomics, microbial diversity and ocean microbiology, there will be a press conference tomorrow run by the Venter Institute relating to a series of papers (I am an author on some) coming out in PLoS Biology. The papers relate to Venter’s Global Ocean Voyage – sailing around the world collecting microbial samples. These were then used for environmental shotgun sequencing and the papers discuss various aspects of analzying the data.

Say what you want about metagenomics, and Craig and genomics, if you are a critic. But (1) read the papers, (2) give Venter some credit for publishing in an Open Access journals unlike many of the so called “public” genome effort folks who generally only pretend to support public/open access to anything.

Here is a link to view the live web cast of the PloS Biology GOS Expedition publication press conference. The press conference will be held tomorrow March 13 from 10-11 a.m. EST. After tomorrow an archive of the web cast will be hosted on the JCVI web site.

The papers are now live on the PLoS Biology Web Site.

The Global Ocean Sampling Collection can be found here.

My essay on Environmental Shotgun Sequencing can be found here.