Metagenomics Meeting — Competing with the Election

Well, I apologize but I am not going to post anything today about the metagenomics meeting in San Diego since I came late today as I wanted to be at home for the beginning of the election. But I made my way down to San Diego and made it to dinner. The dinner “entertainment” was a talk by one of the grand gurus of ocean microbiology – Steven Giovannoni. Alas, even he realized that he was competing with people wanting to know about the election and I confess I spent most of his talk hitting reload on my phone and surfing between sites. But So I have no notes to post about his talk. But I can say that I am happy about the election. And tomorrow I will try to post some notes about talks. But I may be still too happy to take notes …

((Note added later — in retrospect, I (and others I talked to) felt Steve G’s talk had way way too much detail for an after dinner talk so I spent the next day taking out much of the detail from my talk to lighten it up. What did this get me? After my talk and later after drinks Steve G. made it clear he thought it stunk because it was too light on details of something he thought should have been in it. Oh well, I guess this goes to show you cannot make everyone happy.)).

Lake arrowhead notes – UPDATED

Well I gave my talk

Seemed to go ok except getting cutoff early because the chair ignored
that the session started late but that is ok

George Weinstock is now speaking using my laptop so I am trying to
post from my phone

He said one key thing I left out … Big scale microbial sequencing
projects are now possible thanks to next gen sequencing in particular
454-Roche tools

More later
Sent from my iPhone

————————————————-

More now

George Weinstock gave a good overview of the “Human Microbiome Project” which is a NIH Roadmap initiative to catalogue the genomic content of the microbes associated with humans. He described some of the big picture of why do the project, of the different fundingin initiatives being done through NIH and he gave some detail on the “jumpstart” project going on at the big genome centers right now. He outlined how the current plan is to select a few hundred people and to survey their mcirobiomes from multiple sites using rRNA PCR and possibly metagenomics. In addition, he described how there is also an effort to sequence 100s if not a 1000 genomes of cultured organisms that have been isolated from human environments. He did say one thing I disagreed with which is that he thinks it is somewhat reasonable to treat the environment that microbes live in in essence as a big bag of genes. In other words, if you sequence from a community, he implied that one can focus just on the genes and their functions and not the organisms that they come from. On this I disagree (and pointed this out after the next talk). But overall George gave anice overview of the project and its goals.

Eric Wommack gave a good talk about viral metagenomics work he has been doing. He pointed out that a lot of the viral world is “unknown” but that does not mean it is unimportant. And this is consistent with what I and George Weinstock said which is that we need more genome data from viral isolates. Eric presented some very useful results on the challenges of using short read sequence data in metagenomics and he referenced a few papers on this. He also referred to a cool viral genome survey project that I was not aware of by Hatfull which involved undergraduates in sequencing and analyzing the genomes of phage that infect Mycobacterium smegmatis.

Jim Bristow on Biofuels. He is now giving a summary of some of the JGI work on the genomics of cellulolytic organisms and processes. He is focusing on the termite gut community and had some good one liners about this (e.g., he said many people want to kill termites but not JGI. They are our friends; he also said “it takes a village to sequence a termite gut”).

Not sure exactly how to say this, but here goes. There was one talk in the AM I was not overly fond of. This was a talk by Bernard Palsson. Now I confess, I am not overly familiar with much of his work but what I know of it suggests he does some really solid, interesting and important work on metabolic network modeling and analysis. But his talk at this meeting was disappointing. His talk was about his use of genome sequencing to characterize “adaptive evolution” in E. coli. And the results he presented seemed solid enough. The problem I had was that it was a prime example of “overselling genomics”. Why? Here is what they did. They took E. coli mutants. And the then took them through cycles of growth and then dilution. And then they looked at the populations after a certain number of generations and did a variety of analyses. Included in this was some whole genome sequencing that helped identify mutations arising in the cultures. And then they did some characterization of these mutations/mutants including some competition experiments and some pretty interesting gene expression studies of some RNA polymerase mutants. And he made some conclusions based on their results like that E. coli in the lab can find new adaptive peaks and that mutations differ in different replicates, and that different mutations confer different fitness, that they can monitor the appearance of mutations over time, and so on.

So what is the problem — the problem is that he (1) presented this as though the serial cycling of E. coli was novel when in fact it is not and that (2) he presented the conclusions as though they were novel when they also are not. People have been doing this type of experiment for many decades (in fact, one person, Rich Lenski, has been doing an experiment like this for decades). And they get these exact results. But they have not sequenced genomes as part of their experiment. And thus, at least for this talk, they were not mentioned, and the rediscovery of many truisms in population genetics was presented as novel because it involved genome sequencing.

Trent Northen created a serious buzz during and after his talk with his presentation of some of the things one can do with Nanostructure Initiated Mass Spectrometry (NIMS). I confess – I want his toys.

Lynn Silver is now talking about the challenges in the development of new antibiotics. She argues that the focus by some on trying to find new targets for antibiotics has been a bit misguided.

Julian Parkhill gave a good talk about population genomics of Salmonella. He pointed out a few things people still ignore. For example, if you want to identify polymorphisms in a species to use for population genetics/genomics studies, you really need to do a survey to identify polymoprhisms from diverse members of the population. If you do not, and then you use a biased set of polymorphisms, your population inferences will be wrong. He also said, in response to a quesiton of mine, that at least for this species, they see very little variation in copy number in genes which is different than what people seem to see in humans.

Tiffany Williams from Baylor gave a talk about using high throughput sequencing in collaborations with developing countries. She outlined some of the challenges as well as the benefits from such collaborations.

Kim Lewis gave a very interesting talk on microbial biofilms and persister cells, of which I know vanishingly little. He showed some very cool experiments trying to “complement” unculturable organisms and get them to grow.

Jeffrey F. Miller gave a talk focusing on diversity generating retroelements in bacteria which appear to be a means by which bacteria can target particular regions of the genome for mutagenesis in a comparable way to VDJ mutagenesis in humans. This was perhaps my favorite talk so far at the meeting as it combined microbial genomics, evolvability, mutation processes and other things I tend to focus on.

Steven Benner gave a talk which I had to skip out on early because I was doing a radio interview. Benner said one thing that annoyed me at the beginning – he made a comment that was complaining about prior talks that referred to “Rosetta Stone” methods of predicting function (I was one of the people who mentioned this) because he thought that we were referring to blast searches. He clearly was not paying any attention as the Rosetta Stone method is a method to predict function for genes by finding connections between non homologous proteins based upon having other proteins that have domains found in both of the original proteins of interest. Oh well, glad I had to leave early because I was itching to jump up and correct him.

Heather Allen, from Jo Handelsman’s gave a very good talk about doing functional metagenomic screens for antibiotic resistance encoding genes. She has been using DNA from multiple soil sites, including a pristine site in Alaska, and screening the DNA for antibiotic resistance genes in E. coli. These screens identify a wide diversity of genes, including some novel forms. This work helps highlight the need to not just sequence the snot out of the world but to also do some functional assays at the same time. In addition she mentioned that she was able to come to the meeting because Jo Handelsman set up a fund for mothers to pay for babysitters to come to a meeting with them. All I can say is Jo Handelsman was already one of my favorite people in science and this is just another brilliant and wonderful thing that she does.

David Relman gave a talk about two studies of the human microbiome that his lab has been doing: (1) studies of marine mammals to compare the microbial diversity in their surfaces with the diversity in the water and the diversity on their insides and (2) study the response of the human gut microbial community to antibiotic treatment. I am particularly fond of the antibitotic treatment study because they are treating it as an “ecological disturbance” study and analyzing it much like ecologists would analyze recovery of a forest after fires. I think we definitely need more ecologists to bring their techniques and skills to human microbiome studies and so this was exciting to see.

Ashlee Earl gave a talk about biofilm formation in Bacillus subtilis. Much like Kim Lamb’s talk earlier, this talk was in an area I know little about and I guess you could say it kind of blew my mind. It seems that in B. subtilis and I guess in many other microbes biofilms are in essence analogous to multicellular organisms. Within a biofilm there are different types of cells that have different roles and the patterns are highly reproducible and organized. It seems to me that the boundary between multicellular and single-celled organisms is getting blurrier and blurrier. Ashlee reported on some cool experiments where she collected strains from around the world and then dod comparative genetics and genomics of their biofilm formation patterns.

Alas I missed Mary Lidstrom’s talk which based upon prior experiences I am sure was fascinating. She has been working in studying processes inside single bacterial cells and has been developing a suite of techniques and tools to carry out such studies. Maybe someone else from the meeting can post details about her talk.

Unfortunately, I had a conference call during some of the next talks that I had to do so I do not have details for the blog. Then I returned and served as chair for a session. I did take some notes so here goes.

Byung-Kwan Cho gave a tour de force talk about reconstructing the transcriptional regulatory network in E. coli. He presented results from a dazzling and dizzying array of genome-scale methods (e.g., ChipChip, tiled arrays, sequencing, etc etc) to characterize transcription regulation. In addition he did some complex and big scale computational work to combine all of the data together to characterize networks. It was quite impressive stuff.

Ginger Armbrust talked about her favorite critters – diatoms and focused on how they used the genome data to characterize silicon deposition processes. She was convincing as to the importance of diatoms and to the value of having the genome sequences from some species. She did discuss some of the challenges of using the genome data including the challenges in gene prediction for microbial eukaryotes. She also discussed her dream of utilizing some of the new genomic information as part of real time sensors in the oceans.

Anthanasios Typas discussed work to build tools for carrying out genome-scale analyses of genetic and chemical-genetic interactions. For example they are working on taking two comprehensive gene KO libraries from E. coli and using them to create all possible double mutants and to then screen those mutants for whether they have the same or different phenotypes than the single mutants. This allows them to look for gene-gene interactions. They also are doing this type of analysis with chemical-gene interactions.

Devaki Bhaya gave a brief talk on what I think is the single most interesting thing in all of microbiology right now – CRISPRs. These are clustered regularly interspaced short palindromic repeats. She is studying them in cyanobacteria from Yellowstone hot springs

Good quotes from the meeting:

  • So we simply sequenced the genome of the different variants
  • Antibitoics do not kill things, they corrupt them
  • Dormancy is the default mode of most bacterial life
  • Who knows what a yoctomole is?
  • I am going to defend genomics
  • There comes a point in life when you have to bring chemists into the picture
  • Gosh, was that today or yesterday
  • The rectal swabs are here in tan color
  • I’ll try to let the pictures do the talking and I will get out of the way
  • Our model system de jour
  • And there’s Jeffrey Dahmer
  • And this is my cheesy analogy here
  • He could not be here so I am here. His loss. My gain. Hopefully not your loss.
  • We are the environment. We live the phenotype.
  • If I have time I will tell you about a dream
  • Every fifth breath – thank a diatom
  • While we still have poles
  • A paper came out next year

My first PLoS One paper …. yay: automated phylogenetic tree based rRNA analysis

ResearchBlogging.org
Well, I have truly entered the modern world. My first PLoS One paper has just come out. It is entitled “An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP)” and well, it describes automated software for analyzing rRNA sequences that are generated as part of microbial diversity studies. The main goal behind this was to keep up with the massive amounts of rRNA sequences we and others could generate in the lab and to develop a tool that would remove the need for “manual” work in analyzing rRNAs.

The work was done primarily by Dongying Wu, a Project Scientist in my lab with assistance from a Amber Hartman, who is a PhD student in my lab. Naomi Ward, who was on the faculty at TIGR and is now at Wyoming, and I helped guide the development and testing of the software.

We first developed this pipeline/software in conjunction with analyzing the rRNA sequences that were part of the Sargasso Sea metagenome and results from the word was in the Venter et al. Sargasso paper. We then used the pipeline and continued to refine it as part of a variety of studies including a paper by Kevin Penn et al on coral associated microbes. Kevin was working as a technician for me and Naomi and is now a PhD student at Scripps Institute of Oceanography. We also had some input from various scientists we were working with on rRNA analyses, especially Jen Hughes Martiny

We made a series of further refinements and worked with people like Saul Kravitz from the Venter Institute and the CAMERA metagenomics database to make sure that the software could be run outside of my lab. And then we finally got around to writing up a paper …. and now it is out.

You can download the software here. The basics of the software are summarized below: (see flow chart too).

  • Stage 1: Domain Analysis
    • Take a rRNA sequence
    • blast it against a database of representative rRNAs from all lines of life
    • use the blast results to help choose sequences to use to make a multiple sequence alignment
    • infer a phylogenetic tree from the alignment
    • assign the sequence to a domain of life (bacteria, archaea, eukaryotes)

  • Stage 2: First pass alignment and tree within domain
    • take the same rRNA sequence
    • blast against a database of rRNAs from within the domain of interest
    • use the blast results to help choose sequences for a multiple alignment
    • infer a phylogenetic tree from the alignment
    • assign the sequence to a taxonomic group

  • Stage 3: Second pass alignment and tree within domain
    • extract sequences from members of the putative taxonomic group (as well as some others to balance the diversity)
    • make a multiple sequence alignment
    • infer a phylogenetic tree

From the above path, we end up with an alignment, which is useful for things such as counting number of species in a sample as well as a tree which is useful for determining what types of organisms are in the sample.

I note – the key is that it is completely automated and can be run on a single machine or a cluster and produces comparable results to manual methods. In the long run we plan to connect this to other software and other labs develop to build a metagenomics and microbial diversity workflow that will help in the processing of massive amounts of sequence data for microbial diversity studies.

I should note this work was supported primarily by a National Science Foundation grant to me and Naomi Ward as part of their “Assembling the Tree of Life” Program (Grant No. 0228651). Some final work on the project was funded by the Gordon and Betty Moore Foundation through grant #1660 to Jonathan Eisen and the CAMERA grant to UCSD.

Wu, D., Hartman, A., Ward, N., & Eisen, J. (2008). An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) PLoS ONE, 3 (7) DOI: 10.1371/journal.pone.0002566

Open Science Highlight — JoVE: Journal of Visual Experiments

I am starting to browse around at JoVE (the Journal of Visualized Experiments).  This is an open journal dedicated to publishing biological research in a visual format.  It is pretty cool.  Some good videos include ones by Jared Ledbetter’s group on studying microbes inside termites (see Layers of Symbiosis – Visualizing the Termite Hindgut Microbial Community
and Extracting DNA from the Gut Microbes of the Termite (Zootermopsis nevadensis)) for example.  Others of interest to this blog include one by Ed Delong’s group on Large-Scale Screens of Metagenomic Libraries. 
Anyway – JoVE is definitely worth checking out … visual presentations are probably the way of the future …
In addition to experiments, JoVE also has some interviews.  For example, I am embedding a nice video from JoVE with a talk by Ed Delong on microbial communities (Microbial communities in nature and laboratory – interview (Video Protocol).   To view the embedded video click the play button below.

http://www.jove.com/index/embed.stp?ID=202

Metagenomics just keeps getting bigger …

Yet another tip of the hat form the scientific community to the growing field of metagenomics. Today Ed Delong, one of the pioneers of using metagenomic methods to study microbes, was elected to the National Academy of Sciences. Congrats to Ed for this well deserved recognition (now I note, he has done many things in ocean microbiology that are not metagenomics … but we will pretend here that this was all about his metagenomics work).

Other people elected of particular relevance to this blog — David Hillis, a great evolutionary biologist, and Rosemary Grant, of Darwin’s finches fame.

Trip to CALIT2 & CAMERA

Just got back from a one day trip to San Diego to visit the folks who run CAMERA, the metagenomics database being run out of CALIT2/JCVI. The main point of this meeting was to start to figure out how to take computational tools that we have developed in my lab or will develop in my new iSEEM project (with Katie Pollard and Jessica Green) and make them available in CAMERA.

But as usual, the most fun part of the trip was to see the CALIT2 toys. And boy do they have toys. Larry Smarr, the director of CALIT2 and the PI on the CAMERA project (funded by the Gordon and Betty Moore Foundation) gave us a quick tour around the building. My slide show is embedded below. Mostly we got to see the massive multimonitor “optiportal” display walls. We also got to see the big linux cluster that is the guts of CAMERA (and may favorite part, of course – the big PLoS Biology sign relating to the Global Ocean Survey papers in front of the computer).

CAMERA, which stands for Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (thus, why we say CAMERA), is a big and complex enterprise, hosting metagenomics sequence data, metadata associated with the sequence, and a variety of analysis tools for working with the data. You can find out more about it in a paper from PLoS Biology here.

Now, CAMERA is not the only metagenomics database out there. The other main one people seem to use is IMG/M from JGI. If you are interested in metagenomics analysis in any way, it is worth becoming familiar with both systems.

http://picasaweb.google.com/s/c/bin/slideshow.swf

Metagenomics Education

Just a quick one here. I was reminded recently about an interesting publication about metagenomics education in which some people might be interested. It is by Anne Jurkowski, Ann Reid and Jay Lebov and was published in a journal called CBE Life Sciences Education. This journal, though not fully Open Access is freely available online.

I think the article is a useful callto arms for educators to get ahead of the curve and to start thinking about ways to teach metagenomics BEFORE it becomes an old field (i.e, while it is hot, and who knows how long that will last).

iSEEM Wants You — We are Recruiting Post Docs in Metagenomics Informatics

I am pleased to announce that a new project on “Integrating Statistical Evolutionary, and Ecological Approaches to Metagenomics (iSEEM)” is getting up and running. The iSEEM Project, funded by the Gordon and Betty Moore Foundation, takes an integrated, interdisciplinary approach to metagenomic analysis.

The project spans three labs (mine and those of Katherine Pollard (who is in the Davis Genome Center where I am) and Jessica Green (who is at U. Oregon)), each with different areas of focus. Overall, the plan is to develop and apply novel methods for analyzing metagenomic data with a focus on three main topics: phylogenetic characterization of organisms, ecological diversity, and population genomics. We will be posting more detail about the project at http://iseem.org.

We are seeking five post-doctoral scientists and a bioinformatics engineer to work on methodology for analysis of metagenomic data as part of this collaborative project. Each position will be associated with one of the PIs at their home institution. If you are interested in microbial diversity, metagenomics, or genome evolution, and are looking for a post doc and want to be part of a interdisciplinary collaborative project, please apply.

More detail on the jobs is below:

Qualifications

  • We are looking for people with a demonstrated interest in working at the interface between the quantitative and biological sciences. We will offer a generous salary and benefits commensurate with experience.
  • Postdocs: Applicants should have a PhD in a biological, computational, mathematical, or statistical field. Programming skills are highly desirable.
  • Bioinformatics Engineer: Applicants should have substantial experience with database programming (e.g. SQL), scripting (e.g. Perl or Python), and bioinformatics tools.

Term: Appointments will last 2 years beginning in Summer 2008.

To Apply: Please apply using our online application system. You will be asked for:

  1. a brief cover letter explaining your background, career interests, and preferred geographical location for work (if any),
  2. CV (including publications),
  3. names and contact information for three references.

Online Application System:

Sharpshooters, dual symbioses and new ways to sequence a genome

Those interested in symbioses and in new sequencing methods should look at a paper that just came out in PNAS by John McCutcheon and Nancy Moran (OK – I am a bit biased – this work is related to something I did previously with Nancy). Their paper reports a further dissection of a dual symbioses in sharpshooters (a group of insects that feed on xylem sap). The dual symbioses involves two types of bacteria that live inside specialized cells in the gut of these insects.

Previously, my group had worked with Nancy to sequence the genome of one of the symbionts (Baumannia) as well as part of the genome of the second one (Sulcia). Nancy was interested in this symbioses for many reasons including that as obligate xylem feeders the sharpshooters almost certainly were not getting gall the nutrients they needed in their diet. Based on what was known about bacterial symbionts in other sap feeding insects (e.g., aphids) it seemed likely that the symbionts of the sharpshooters were making the missing nutrients for their host. However, all previous genomic based studies had been done on phloem feeding insects like aphids. Phloem and xylem are the two main circulatory systems in plants. Phloem tends to be nutrient rich, although still not rich enough for the aphids to live on it alone. Thus the aphids rely on bacterial symbionts to make amino acids missing in the phloem.

Xylem is generally much poorer in nutrients and this Nancy wanted to compare the genomes of the symbionts of xylem feeders with those of phloem feeders. Nancy and others had done preliminary work on the sharpshooters showing that they had multiple symbionts living inside cells in their gut and that one of the symbionts (which she named Baumannia after Paul Baumann who she had worked with previously) was closely related to the Buchnera symbionts found in aphids.

So Nancy approached me when I was at TIGR and asked if I would be interested in helping her sequence the Baumannia genome. I said yes (secretly, truth be told, I would have tried to sequence the genome of a rock if Nancy asked. She is perhaps the smartest person I know in all of science and is always doing the coolest types of research. Plus, I figured, I might also be able to interact with her husband, Howard Ochman, who also does cool stuff).

Of all the possible sharpshooters (the symbionts are found in all sharpshooters), Nancy chose to focus on the glassy winged sharpshooter because it is an important pest organism (it is a vector for Pierce’s disease in grapes).

So – we (well, the core facility at TIGR under my supervision) sequenced the Baumannia genome using DNA that Nancy had isolated from dissections of the gut of glassy winged sharpshooters. In analysis of the genome we (well, again, the royal we — in this case Dongying Wu in my lab did most of the analysis) found, among many things, that Baumannia appeared to be making vitamins and cofactors for the host. But alas, we also found something missing — Baumannia did not appear to be able to make amino acids for the host. Since xylem was likely to be missing amino acids that all animals require in their diet, we had figured that Baumannia must be making them for the host. So we were vexed.

That was, until Nancy pointed out (or reminded us – since she probably had mentioned it before) that there was another symbiont living in the gut of these insects — a symbiont called Sulcia. She suggested that we look at the DNA sequence pieces that did not assemble with the Baumannia genome and look for any that might encode genes similar to genes from the group of bacteria in which Sulcia is found. And, 1.5 years later, after much informatics and lab work, we obtained about 130 kb of the genome of this second symbiont and found that it encoded at least some of the essential amino acid synthesis pathways that could make the needed amino acids for the host. And we stopped there, published a paper in PLoS Biology proposing the existence of a dual symbiosis with one symbiont making vitamins and cofactors and the other making amino acids, and moved on to other things.

Now in this new paper, Nancy’s lab has returned to this symbioses and has finished the genome of Sulcia (the genome is available here in Genbank). And the story just gets cooler and cooler. With this complete genome they get a more detailed picture of the symbiosis than we were able to obtain, and are able to really reconstruct the whole system (and correct some mistakes we had made in our paper). My favorite thing in their paper is Figure 3 which you can find here (I am not sure about the PNAS policy of putting the image in my blog since this does not seem to be an Open article). This figure shows their reconstruction of what could be called to community metabolism. Interestingly it appears the symbionts depend on each other and are not just passing things on to the host separately.

Another important aspect of their paper is that it is the first (as far as I know) example of a genome being finished using a combination of the two hot new sequencing methods – 454/Roche and Illumina/Solexa. Basically they used the Roche/454 method to provide deep coverage of the Sulcia genome and then used Illumina/Solexa sequencing to get accurate sequence data for the types of sequence for which the Roche method does not work well.

So – check out the paper in PNAS. You won’t regret it.

Metagenomics leads to discovery of smallest primate

Check out the post at Suicyte Notes on the discovery of a very very small novel primate from analyzing metagenomic data. Man, metagenomics rocks.