My twitter wrap up of the Joint Genome Institute User Meeting #JGIUM

Off to another meeting so don’t have time to write up details of the JGI User Meeting that just ended.  But I am posting my tweets and some related tweets here.  Also, apparently videos of the talks will be available soon. Will try to clean up the style of the posts ASAP but on the road …

kucsbl CSBL at Korea Univ.
RT @phylogenomics Rob Knight discussing the rationale behind his UNIFRAC metric to comparing communities using phylogeny #JGIUM
5 hours ago Favorite Retweet Reply

CackleofRad CackleofRad
@
@Tideliar @jadebio You know who else I think is awesome? Try @phylogenomics and maybe #JGIUM Not medical per se but cool things in the werks
17 hours ago

phylogenomics Jonathan Eisen

Uy vey: got home from #JGIUM & my kids had chlamydia, anthrax, E. coli, malaria, athletes foot, Helicobacter & giardia yfrog.com/gzk75auj
20 hours ago

phylogenomics Jonathan Eisen
Next up at #JGIUM Dan Distel on Shipworm symbionts – note here is a picture of Dan (foreground) from 1992 cruise http://twitpic.com/4cvb3w
22 hours ago

phylogenomics Jonathan Eisen

Apologies all – have to skip end of #JGIUM – follow this hashtag for other’s posts
23 hours ago

phylogenomics Jonathan Eisen
Scholin hacked into unsecured wireless network while on vacation in Death Valley to communicate w/ his sensors in Newport Beach #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Scholin has some of these remote sensors moored off of Newport Beach pier to survey for toxic diatoms #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Holy ecogenomic sensor batman – this remote ESP thing is very cool mbari.org/ESP/esp_2G.htm – can do DNA analysis remotely #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Next up Chris Scholin on remote detection of marine microbes in coastal waters and deep sea #JGIUM
24 Mar

phylogenomics Jonathan Eisen
In the Q and A period for Dan Distel some music has now come on over the speakers #oscars? #exitmusic #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Distel discussing proteomics of shipworm symbionts communities – uses method that focuses on proteins in symbionts not host #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Oh no – Distel mentioned the CaZome when discussing carbohydrate active enzymes in shipworm symbionts #badomicsword #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Distel: the gill symbionts are in addition to symbionts living in the gut that are known to degrade cellulose #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Distel: many (~10) types of closely related bacterial symbionts live inside the cells of shipworm in the gill #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Note – here is a link to my #PLoSONE paper with Distel on the genome of a shipworm symbiont plosone.org/article/info:d… #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Distel: shipworms are very diverse and can live off all sorts of wood & wood like stuff – major wood consuming organisms in ocean #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Dan Distel Works at the Ocean Genome Legacy foundation #JGIUM #coolgroup oglf.org/DistelCV.htm
24 Mar

phylogenomics Jonathan Eisen
Distel: shipworms (which are actually clams) cause billions of dollars of economic damage each year #JGIUM
24 Mar

doe_jgi Joint Genome Inst.
RT @phylogenomics Distel expressing thanks to JGI b/c despite claims by many that sequencing is free, nobody has told his acctg dept #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Distel expressing thanks to JGI b/c despite claims by many that sequencing is free, nobody has told his accounting department #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Next at #JGIUM Dan Distel on shipworm symbionts – note here’s a pic of Dan (foreground) from ’02 http://www.scancafe.com/p-59386415-38fa9f
24 Mar

sharmanedit Anna Sharman
Wow. MT @phylogenomics [Ed] Buckler: any two corn [maize] plants are as different from each other as humans and chimpanzees #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Next at #JGIUM Mary Ann Moran; Note paper I have w/ her is one Nature fails to make free despite promises nature.com/nature/journal… #opengate
24 Mar

phylogenomics Jonathan Eisen
Buckler described Genotyping by sequencing method from his in press #PLoSOne paper #JGIUM maizegenetics.net/images/stories…
24 Mar

phylogenomics Jonathan Eisen
Buckler suggests the genetic diversity in maize allows it to adapt to local environments better than other species #JGIUM #notbuyingit
24 Mar

phylogenomics Jonathan Eisen
Buckler: genomic domestication analysis w/ Ross-Ibarra lab from #ucdavis: little loss in diversity from landraces -> improved lines #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Buckler: Tripsacum genome has different retrotransposons than maize but otherwise may be useful as source of genetic variants #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Buckler also sequencing Tripsacum – sister genus of maize #JGIUM no chromosomal duplications, very similar gene content to maize
24 Mar

phylogenomics Jonathan Eisen
Buckler: doing maize HAPMAP2 to survey genetic diversity in corn #JGIUM
24 Mar

leonidkruglyak Leonid Kruglyak
So are yeast strains RT @phylogenomics: Buckler: any two corn plants as different as human and chimp #JGIUM #myspeciesisbetterthanyours
24 Mar

phylogenomics Jonathan Eisen
Buckler: any two corn plants are as different from each other as humans and chimpanzees #JGIUM #myspeciesisbetterthanyours
24 Mar

leonidkruglyak Leonid Kruglyak
Ed Buckler RT @phylogenomics: Next up Ed Buckley from Cornell discussing sequencing/using maize genome #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Next up Ed Buckley from Cornell discussing sequencing/using maize genome #JGIUM
24 Mar

sarahcpwilliams sarahcpwilliams
@phylogenomics enjoying yr tweets from #jgium. i’m a big fan of ley and knight. covered their work last year for hhmi: http://bit.ly/g2NEug
24 Mar

phylogenomics Jonathan Eisen
Ley : to study diversity of microbes associated with maize had to get primers that did not amplify chloroplast rDNA #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Ley: doing a QTL experiment on maize/corn treating microbes as their quantitative trait #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Ley now expanding human microbiome GWAS twin study to include surveying microbes all over body #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Ruth Ley: GWAS studies of human twins has IDd many loci that appear to affect microbial diversity – including some immune system loci #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Ruth Ley is now doing GWAS studies w/ human twins where the phenotype they are looking at is microbial diversity #JGIUM #verycool
24 Mar

phylogenomics Jonathan Eisen
Ruth Ley discussing survey of microbes in one child over two years #JGIUM
24 Mar

phylogenomics Jonathan Eisen
Ruth Ley now up at the JGI User Meeting discussing maize, human microbiotas #JGIUM … Note – I love her work #brilliant
24 Mar

kevinswilson66 Kevin Scott Wilson
@
@phylogenomics : Many thanks for your notes on #JGIUM . I was captivated by them
23 Mar

Symbiologica Juliana Mastronunzio
Thanks for tweets on the JGI meeting #JGIUM from @phylogenomics and @iGenomics.
23 Mar

sdaxen Seth D. Axen
RT @phylogenomics: Schuster: “I would like to finish my talk by discussing sequencing the devil” #JGIUM
23 Mar

Pathh1 Pat Heslop-Harrison
Done that – I found the devil in the detail. RT @phylogenomics: Schuster “like to finish my talk by discussing sequencing the devil” #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Schuster: “I would like to finish my talk by discussing sequencing the devil” #JGIUM
23 Mar

iGenomics Dawei Lin
RT @phylogenomics: Schuster: Stays away from traditional sources of ancient DNA like bone and uses hair #JGIUM
23 Mar

doe_jgi Joint Genome Inst.
RT @phylogenomics Schuster: got .1g of hair from 200 year old mammoth sample from Russia and can get mitochondrial genome #JGIUM #fb
23 Mar

phylogenomics Jonathan Eisen
Schuster: got .1g of hair from 200 year old mammoth sample from Russia and can get mitochondrial genome #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Schuster working on thylacine (tasmanian tiger) mitochondrial genomes #JGIUM thylacine.psu.edu
23 Mar

phylogenomics Jonathan Eisen
For more on mammoth genomics see mammoth.psu.edu #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Schuster: when you sample extinct organisms you have to remember that different samples may come from different times #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Schuster: “You would not believe how much mammoth hair I have washed off myself” #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Schuster: Stays away from traditional sources of ancient DNA like bone and uses hair #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Schuster – redundancy in genome sequencing with ancient genomes helps build quality genome assemblies #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Schuster: one reason to focus on mitochondrial genomes is that there are lots of copies of the genome per cell #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Schuster : “dont forget about mitochondrial genomes” still lots of species that do not have mt genome sequences #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Schuster: discussing mammoths, moas, thylacines, tasmanian devils and polar bears #JGIUM #museomics #conservation #endangered
23 Mar

phylogenomics Jonathan Eisen
For more on Schusters work on extinct species see cidd.psu.edu/people/scs19 #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Next up Stephan Schuster discussing the Genomics of Extinct and Endangered Species #JGIUM #museomics
23 Mar

Energy_Science Energy Science News
@doe_jgi: RT @phylogenomics Why #badomics words can also be very good: a case in study with museomics #JGIUM http://ff.im/-zyA86 #fb
23 Mar

doe_jgi Joint Genome Inst.
RT @phylogenomics Why #badomics words can also be very good: a case in study with museomics #JGIUM http://ff.im/-zyA86 #fb
23 Mar

phylogenomics Jonathan Eisen
Why #badomics words can also be very good: a case in study with museomics #JGIUM http://ff.im/-zyA86
23 Mar

phylogenomics Jonathan Eisen

Eddy Rubin at the #JGIUM is soliciting input from crowd on future needs of the community
23 Mar

Energy_Science Energy Science News
@doe_jgi: #JGIUM bingo anyone? DOE JGI Director Rubin mentions @phylogenomics GEBA project in his talk on future of DOE JGI #fb
23 Mar

phylogenomics Jonathan Eisen

The perils of giving out #badomics word awards – a prior recipient at #JGIUM just told me he’s still angry at me phylogenomics.blogspot.com/2009/01/worst-…
23 Mar

doe_jgi Joint Genome Inst.
#JGIUM bingo anyone? DOE JGI Director Rubin mentions @phylogenomics GEBA project in his talk on future of DOE JGI #fb
23 Mar

Energy_Science Energy Science News
@doe_jgi: #JGIUM Rob Knight: “There is one universal tree of life which is why projects such as @phylogenomics GEBA are so critical” #fb
23 Mar

phylogenomics Jonathan Eisen
All I can say is that when I was rejected by HHMI a few yrs ago I felt better when I heard Rob Knight got it b/c, well, he rocks #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Rob Knight – in human microbiome studies you actually need VERY few sequences per sample to get overall trends #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Rob Knight: “There is one universal tree of life” and giving props to my GEBA genomic encyclopedia project #JGIUM
23 Mar

doe_jgi Joint Genome Inst.
#JGIUM Rob Knight: “There is one universal tree of life which is why projects such as @phylogenomics GEBA are so critical” #fb
23 Mar

phylogenomics Jonathan Eisen
Rob Knight discussing the rationale behind his UNIFRAC metric to comparing communities using phylogeny #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Rob Knight discussing how sequencing has gotten so cheap and high throughout that analysis tools are the limiting step in many cases #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Rob Knight now up at #JGIUM – he publishes more cool papers per month than just about anyone in microbial research
23 Mar

phylogenomics Jonathan Eisen
Silver: took sugar secreting cyanobacterium and got macrophage to take them up and they survive a little bit #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Silver: injected sugar secreted cyanobacterium into zebrafish zygotes and get functional fish with cyanos all throughout them #jgium
23 Mar

phylogenomics Jonathan Eisen
Silver: engineered a cyanobacterium secrete sugars so thought maybe they could use this to make photosynthetic animals #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Silver also interested in biohydrogen production #JGIUM but two problems: most hydrogenases are oxygen sensitive and electron competition
23 Mar

phylogenomics Jonathan Eisen
Silver is working on engineering 3hydroxypropionate carbon fixation pathway from Chloroflexus in E. Coli #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Silver claimed that Cyanobacteria are responsible for 50% of photosynthesis on earth but I think that must be too high #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Silver working on redesigning photosynthesis via cyanobacteria #JGIUM – says they need to learn a lot of biology still
23 Mar

phylogenomics Jonathan Eisen
Silver: though she tries to get $$ from basic scion agencies , they never fund her #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Pam Silver: uses “redesign of a system can test our understanding of it’s components” to try to get $$ from basic science agencies #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Pam Silver “Biology is the technology of this century” is the message she wants to gt across #JGIUM
23 Mar

phylogenomics Jonathan Eisen
Jerry Tuskan getting some hard but good questions after his talk at #JGIUM – Q and A much more interesting than talks usually
23 Mar

phylogenomics Jonathan Eisen
Next up at #JGIUM Pam Silver – not only brilliant – but also her lab is the source of a good Lady Gaga spoof youtube.com/watch?v=ZilqYp…
23 Mar

phylogenomics Jonathan Eisen

Tuskan using Genome and RNA sequencing and high throughout phenotyping for massive poplar association study #jgium
23 Mar

phylogenomics Jonathan Eisen

At #JGIUM listening to Jerry Tuscan discuss poplar genomics – the place is packed yfrog.com/h82u1nnj
23 Mar

iGenomics Dawei Lin
@
@phylogenomics Terry Hazen talked about it. It can be easily forgot how many people worked behind the scene. #jgium
22 Mar

phylogenomics Jonathan Eisen

Tell me about it: “@iGenomics: It is hard to include all people working on a project into the author list these days. #jgium”
22 Mar

Energy_Science Energy Science News
@doe_jgi: RT @phylogenomics: Personal trivia re: SLAC director Perisis Drell at #JGIUM – previous director Artie Bienenstock was a st…
22 Mar

phylogenomics Jonathan Eisen

Personal trivia re: SLAC director Perisis Drell at #JGIUM – previous director Artie Bienenstock was a student of my grandfather Ben Post
22 Mar

EpiExperts Epigenetics Experts
RT @phylogenomics @doe_jgi: SLAC National Accelerator Lab Director Persis Drell kicks off DOE JGI User Mtg 5pm-use #JGIUM to follow
22 Mar Favorite Retweet Reply

More coverage of the GEBA "Phylogeny Driven Genomic Encyclopedia"

Just a quick note here to post some links to additional stories about my new paper on “A phylogeny driven genomic encyclopedia of bacteria and archaea” which was published last week in Nature (with a Creative Commons license – which is rare in Nature but is what they use for genome sequencing papers).

Carl Zimmer has an article today in the New York Times “Scientists Start a Genomic Catalog of Earth’s Abundant Microbes”  about the paper and the project.  In the article he interviews me and Hans-Peter Klenk, who was a co-author and led the culturing part of the project.  A few things to note about this:

  • It is rare to have archaea mentioned in the New York Times.
  • There is a tree that goes along with the article which is a modified version of the tree we had in our paper.  I think theirs is very nice. Kudos to their artist
  • There is a quote by Norm Pace generally supportive of the project 
  • The article mentions the JGI Adopt a Microbe program and even has a shout out to Malcolm Campbell at Davidson College and his recent PLoS One paper where they discuss results from a project where they took one of the genomes from our project and used it as part of a course on genome annotation/analysis. 

For some of the story behind the paper see my recent blog post “Story Behind the Nature Paper on ‘A phylogeny driven genomic encyclopedia of bacteria & archaea’ #genomics #evolution

Other discussions worth checking out

Also see

ResearchBlogging.org

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N., Kunin, V., Goodwin, L., Wu, M., Tindall, B., Hooper, S., Pati, A., Lykidis, A., Spring, S., Anderson, I., D’haeseleer, P., Zemla, A., Singer, M., Lapidus, A., Nolan, M., Copeland, A., Han, C., Chen, F., Cheng, J., Lucas, S., Kerfeld, C., Lang, E., Gronow, S., Chain, P., Bruce, D., Rubin, E., Kyrpides, N., Klenk, H., & Eisen, J. (2009). A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea Nature, 462 (7276), 1056-1060 DOI: 10.1038/nature08656

Bakke, P., Carney, N., DeLoache, W., Gearing, M., Ingvorsen, K., Lotz, M., McNair, J., Penumetcha, P., Simpson, S., Voss, L., Win, M., Heyer, L., & Campbell, A. (2009). Evaluation of Three Automated Genome Annotations for Halorhabdus utahensis PLoS ONE, 4 (7) DOI: 10.1371/journal.pone.0006291

Story Behind the Nature Paper on ‘A phylogeny driven genomic encyclopedia of bacteria & archaea’ #genomics #evolution

ResearchBlogging.org

Today is a fun day for me. A paper on which I am the senior author is being published in Nature (yes, the Academic Editor in Chief of PLoS Biology is publishing a paper in Nature, more on that below ..). This paper, entitled, “A phylogeny driven genomic encyclopedia of bacteria and archaea” represents a culmination of years of work by many people from multiple institutions. Today in this blog I am going to do my best to tell the story behind the paper – about the people and the process and a little bit about the science.

First, a brief bit about the science in the paper. In this paper, we (mostly people at the Joint Genome Institute, where I have an Adjunct Appointment — but also people in my lab at UC Davis and at the DSMZ culture collection) did a relatively simple thing – we started with the rRNA tree of life as a guide. Then we identified branches in the bacterial and archaeal portions of this tree where there were no genome sequences available (or in progress) (this was done mostly by Phil Hugenholtz, Dongying Wu and Nikos Kyrpides) Next we searched for representatives of these “unsequenced” branches in the DSMZ culture collection (a collection of bacteria and archaea that can be grown in the lab). And we identified in total some 200 of these. And then the DSMZ (under the direction of Hans-Peter Klenk) grew these organisms and sent the DNA to the Joint Genome Institute. And then JGI turned on their genome sequencing muscle and sequenced the genomes of the organisms in the DNA samples. And finally, we spent a good deal of time then analyzing the data asking a pretty simple question – are there any general benefits that come from this “phylogeny driven” approach to sequencing genomes compared to what one might find with sequencing just any random genome (after all, any genome sequence could have some value)? The paper, describes what we found, which is that there are in fact many benefits that come from sequencing genomes from branches in the tree for which genomes are not available.

More on the details of the science below. But first, I want to note that this paper was truly an amazing team effort, with all sorts of people from the JGI in particular, going above and beyond the call of duty to make sure it happened and worked well. And the Department of Energy has been truly phenomenal in my opinion in supporting this project which in the end is not explicitly about “energy” per se but is really about providing a reference set of genomes that should improve the value of all microbial genome data.

Anyway, now for the story behind the story. And be prepared, because this is a bit long. But I think it is important to place this work in a bigger context both in terms of my background as well as some of the background of other people in the project. If you can’t wait for more on the GEBA project then perhaps you should go to some of these links:

And I will post more links as they come up. Below what I try to provide is some of the story behind the story:

My personal interest in applied uses of phylogenetics stage 1: undergraduate preparation at Harvard
As this paper is primarily about an applied use of phylogenetics (in selecting genomes for sequencing), I thought it would be worth going into some of how I personally became a bit obsessed with applied uses of phylogenetics. For me, my obsession began as an undergraduate at Harvard where I got exposed to the value of phylogeny as a tool from many many angles including but not limited to:

  • Freshman year taking a course taught by Stephen Jay Gould where Wayne and David Maddison were Teaching Assistant’s and where they were demoing their new phylogenetics software called MacClade
  • Sophomore year taking a conservation biology class with Eric Fajer and Scott Melvin where I was exposed to the concept of “phylogenetic diversity” as a tool in assessing conservation plans
  • Junior year working in the lab of Fakhri Bazzaz with people like David Ackerly and Peter Wayne who made use of phylogeny as a key tool in their research projects
  • Senior year and the year after graduating where I worked in the lab of Colleen Cavanaugh using rRNA based phylogenetic analysis to characterize uncultured chemosynthetic symbionts. I note it was in Colleen’s lab that I also became obsessed you could say with microbes and why they rock.
My personal interest in applied uses of phylogenetics stage 2: graduate school at Stanford
All of this and more gave me a strong passion for phylogeny as a tool. And so when I went to graduate school at Stanford (originally to work with Ward Watt on butterflies, but then I switched to working in Phil Hanawalt‘s lab on the “Evolution of DNA repair genes, proteins and processes“). And while in that lab I become pretty much obsessed with three things, all related to phylogeny.
First, I was interested in whether the rRNA tree of life, which I had used in my studies in Colleen Cavanaugh’s lab (and in my first paper in J. Bacteriology, which, thanks to ASM, is now in Pubmed Central and free at ASM’s site too), was robust or, as some critics argued, was not that useful. This was a critical question since the best way to study the phylogeny of microbes at the time, and also the best way to study uncultured microbes, was to leverage the ability to clone rRNA genes by PCR and then to build evolutionary trees of those rRNA genes. As part of my graduate work, I did a study where I compared the phylogenetic trees of rRNA to trees of another gene from the same species (I chose, recA). Surprisingly, despite the claims that the rRNA tree was not very useful and that different genes always gave different trees, if you compared the two trees ONLY where there was strong support for a particular branching pattern, the trees of the two genes were in fact VERY VERY similar (a finding that had been suggested previously by others, including Lloyd and Sharp)
Second, I also became obsessed with the fact that most of the experimental studies of DNA repair processes were in a very narrow sampling of the phylogenetic diversity of organisms (e.g., at the time, no studies had been done in archaea, and most studies in bacteria were from only two of the many major groups). So I started experimental studies of repair in halophilic archaea in order to help broaden the diversity of studies. And I began to use PCR to try and clone out repair genes from various other species of diverse bacteria and archaea. Alas, as I was doing this, some institute called TIGR was sequencing the complete genomes of organisms I was trying to clone out single genes from. In fact, one of the first organisms I was working on for PCR studies was Archaeoglobus fulgidus. And when I found out TIGR was sequencing the genome, in a project led by non other than the great microbial evolutionary biologist Hans-Peter Klenk (yes, the same one who helped us in this GEBA project). I decided it was silly to try to clone out individual genes by PCR. And thus I began to learn how to analyze genomes.
It was in the course of learning how to analyze genomes that I came up with another applied use of phylogeny. I realized that one should be able to use phylogenetic studies of genes to help in predicting functions for uncharacterized genes as part of genome annotation efforts. And so I wrote a series of papers showing that this in fact worked (I did this first for the SNF2 family of proteins and then alas coined my own omics word “phylogenomics” to describe this integration of genome analysis and phylogenetics and formalized this phylogenomic approach to functional prediction). I note that what I was arguing for was that protein function could be treated like ANY other character trait and one could use character trait reconstruction methods (which I had learned about while playing with that MacClade program) to infer protein functions for unknown proteins in a protein tree. I note that this notion of predicting protein function from a protein tree is completely analogous to (and one could rightfully say stolen from) how researchers studying uncultured microbes were trying to infer properties of microbes from the position of their rRNA genes in the rRNA tree of life (as I had learned in studies of symbioses).
My personal interest in applied uses of phylogenetics stage 3: my plans for a post doc
So as I was wrapping up graduate school I was seeking a way to go beyond what I was doing and combine studies of DNA repair and evolution and microbiology in another way. And I thought I had found a perfect one in a post doc I accepted with A. John Clark at U. C. Berkeley. John was the person who had discovered recA, the gene I had been using for phylogenetic analysis and for structure function studies. And he was working with none other than Norm Pace and a young hotshot in Norm’s lab, Phil Hugenholtz (as well as a few others including Steve Sandler) in trying to use the recA homolog in archaea as a marker for environmental studies of archaea. It sounded literally perfect. And so I was excited to start this job. That was, until I met Craig Venter.
Grabbing the TIGR by the tail
While I had been playing around with data from TIGR in the latter years of my time in graduate school, I also got involved in teaching a fascinating class with David Botstein, Rick Myers, David Cox and others. (As an aside, this class was part of a new initiative I helped design at Stanford on “Science, Math and Engineering” for non science majors – an initiative that was a pet project of non other than Condie Rice who was Provost at the time). Anyway, Rick Myers was serving as a host for one Craig Venter when he came and gave a talk at Stanford and somehow I managed to finagle my way into being invited to go out to dinner with Craig. And at dinner, I proceeded to tell Craig that I thought some of the evolution stuff he was talking about was bogus and I tried to explain some of my work on phylogeny and phylogenomics. Not sure what Craig thought of the cocky PhD student drawing evolutionary trees on napkins, but it eventually got me a faculty job at TIGR and I worked extensively with Craig so it must have been worth something. And so I and my fiancé Maria-Inés Benito (now wife …) moved to Maryland and spent eight great years there (my wife started in MD as a faculty member at TIGR too, but then she left to go to a company called Informax, may it rest in peace).

Most of my work at TIGR focused on a different side of phylogenomics than represented in the GEBA project. At TIGR I focused on the uses of evolutionary analysis as a component to analyzing genomes – from predicting gene function to finding duplications (e.g., see the V. cholerae genome paper) to identifying genes under unusual patterns of mutation or selection to finding organelle derived genes in nuclear genomes (e.g., see this) to studying the occurrence of lateral gene transfer or the lack of occurrence of it to studying genome rearrangement processes.. And sure, every once in a while I worked on a project where the organism was the first in its major branch to have a genome sequenced (e.g., Chlorobi). And I had noted, along with others that there was a big phylogenetic bias in genome sequencing project (e.g., see my 2000 review paper discussing this here).

But that did not really drive my thinking about what genome to actually sequence until TIGR hired a brilliant microbial systematics expert Naomi Ward as a new faculty member. And it was Naomi who kept emphasizing that someone should go about targeting the “undersequenced” groups in the Tree of Life.

NSF Assembling the Tree of Life grant
And so Naomi and I (w/ Karen Nelson and Frank Robb) put together a grant for the NSF’s “Assembling the Tree of Life” program to do just this – to sequence the first genomes from eight phyla of bacteria for which no genomes were available but for which there were cultured organisms. Amazingly we got the grant. And we did some pretty cool things on that project, including sequencing some interesting genomes, and developing some useful new tools for analyzing genomes (e.g., STAP, AMPHORA, APIS). And I was able to hire some amazing scientists to work in my lab on the project including Dongying Wu (the lead author on the GEBA paper) and Martin Wu (who also worked on the GEBA project and is now a Prof. at U. Virginia) and Jonathan Badger. Alas, we did not publish any earth shattering papers as part of this NSF Tree of Life project on analyzing the genomes of these eight organisms, not because there was not some interesting stuff there but for some other reasons. First, I moved to UC Davis and there was a complicated administrative nightmare in transferring money and getting things up and running at Davis on this project so my lab was not really able to work on it for two years (in retrospect, what a f*ING nightmare dealing with the UC Davis grants system was …).

Then, just as things we ready to get restarted, TIGR kind of imploded and many of the people, including Naomi, my CoPI, left (though I note, my moving to Davis was unrelated to the dissolution of TIGR). But perhaps most importantly, there were some actual technical and scientific problems with our dreams of changing the world of microbiology from our phyla sampling project – the science was not quite there. In particular, having a single genome from each of these phyla was simply not enough to get (and show) the benefits that can come from improved sampling of the tree of life. And thus though we have published some cool papers from this project (e.g., see this PLoS One paper on one of the genomes), we all ended up in one way or another, disappointed with the final results.

Davis and JGI: the return of phylogeny to genomic sampling
When I moved to UC Davis I also was offered (and accepted) an Adjunct Appointment at the Joint Genome Institute (JGI). At both places, I envisioned reinventing myself as someone who worked on studying microbes directly in the environment (e.g., with metagenomics) and symbioses (both of which I had started on at TIGR). And in fact, in a way, I have done this, since I got some medium to big grants to work on these issues. I tried diligently to attend weekly meetings at the JGI but it was difficult since technically I was 100% time at UC Davis and was in essence supposed to be at 0% time at JGI. And when JGI hired Phil Hugenholtz to run their environmental genomics/metagenomics work, I was needed less at JGI since, well, Phil was so good. It was great to go over there and interact with Eddy Rubin, Phil Hugenholtz, and Nikos Kyrpides, among others, but it was unclear what exactly I would do there with Phil running the metagenomics show.

And then, like magic, something came up. I went to one of the monthly senior staff meetings at JGI and while we were listening to someone on the speaker phone, Eddy Rubin handed me a note asking me what I thought about the proposal someone was making to sequence all the species in the Bergey’s Manual. And the light bulb of phylogeny went back on in my head. I told him (I think I wrote it down, but may have said out loud), something like “well, sequencing all 6000 or so species would be great, but it would be better to focus on the most phylogenetically novel ones first.” And in a way, GEBA was born. Eddy organized some meetings at JGI to discuss the Bergey’s proposal and I argued for a more phylogeny driven approach. And this is where having Phil Hugenholtz and Nikos Kyrpides at JGI was like a perfect storm. You see, both had been lamenting the limited phylogenetic coverage of genomes for years, just like I had. Phil had even written a paper about it in 2002 which we used as part of our NSF Tree of Life proposal. And Nikos too had been diligently working for years to make sure novel organisms were sequenced. So though we went to a meeting to discuss the Bergey’s manual idea, we instead proposed an alternative – GEBA.

And for some months, we pitched this notion to various people including at JGI, DOE, and various advisory boards. And the response was basically – “OK – sounds like it COULD be worth doing – why don’t you do a pilot and TEST if it is worth doing” And so, with support from Eddy Rubin and DOE, that is what we did.

One key limitation – getting DNA

So Phil, Nikos and I and a variety of others starting working on the general plan behind GEBA. But there was one key limitation. How were we going to get DNA from all these organisms? One possibility was to seek out diverse people in the community and have them somehow help us. This had some serious problems associated with it, not the least of which was the worry that we might end up sequencing varieties of organisms that people had in their lab but which nobody else had access to (something Naomi Ward and I had written about as a problem a few years before).

And here came the second perfect storm – none other than Hans-Peter Klenk (yes, the same one who had led some of the early genome sequencing efforts when he was at TIGR), was visiting JGI. And he had a relatively new job – at the German Culture Collection DSMZ (In fact, I should note, I had tried to get a job at TIGR even before I met Venter, since they had a position advertised for a “microbial evolutionary biologist” — but that job went to Klenk). Phil Hugenholtz had asked the Head of DSMZ, Erko Stackebrandt, if they might be interested in helping us grow strains and get DNA but we did not yet have a full collaboration with them. And Erko had suggested we contact Hans-Peter. And in his visit to JGI it became apparent that he would do whatever he could to help us build a collaboration with DSMZ. And thus we had a source of DNA. Even more amazingly to me, they did it all for free.

GEBA begins

And thus began the real work in the project. Phil used his expertise with rRNA databases, especially GreenGenes, to pull out phylogenetic trees of different groups. And Nikos used his expertise as the curator of a database on microbial sequencing projects (called GenomesOnline) to help tag which branches in Phil’s tree had sequenced genomes or ones in progress. And then they looked for whether any of the members of the unsequenced branches had representatives in the DSMZ collection. And with some help from Dongying Wu and me, we came up with a list. And with the help of the JGI “Project Management” team including David Bruce and Lynne Goodwin and Eileen Dalin and others at JGI we developed a protocol for collaborating with DSMZ and getting DNA from them.

And I became the chief cheerleader and administrator of the project, in part since Phil and Nikos were so busy with their other things at JGI. And though I was not always on the ball, the project moved forward and we started to get genomes sequenced using the full strength of the JGI as a genome center. The finishing teams at JGI worked diligently on finishing as many of the genomes as possible. And Nikos’ team at JGI made sure that the genomes were annotated. And I helped make sure that they data release policies were broadly open (which everyone at JGI supported). And after many false starts with papers on the project that were way way way to cumbersome and big, with some kicks in the pants from the director of JGI Eddy Rubin who was getting anxious about the project, we turned out the GEBA paper that was published today in Nature.

You might ask, why, as a PLoS official and PLoS cheerleader, we ended up having a paper in Nature? Well, in the end, though I am senior author on the paper, the total contribution to the work mostly came from people at JGI who did not work for me but instead worked with me on this great project. And we took some votes and had some discussions and in the end, despite my lobbying to send it to PLoS Biology, submitting it to Nature was the group decision. I supported this decision in part due to the fact that Nature uses a Creative Commons license for genome papers. But I also supported it because in the end, this was a collaboration involving many many many people and in such projects everyone has to compromise here and there. Now mind you, I am not sad to have a paper in Nature. But I would personally have preferred to have it in a journal that was fully open access, not just occasionally open like Nature.

Now I note, there were a million other things that went on associated with the GEBA project. Some of which I was not even involved in in any way. I will try to write about some of these another time, but this post is already way way way too long. So I am going to just stop here and add that I have been honored and lucky work with people like Phil, Nikos, Hans-Peter, and others on this project and to have the people at the JGI work so hard on the background parts of this project. Thanks to all of them and to the people at DSMZ and in my lab who helped out and to the DOE for funding this work (as well as the Gordon and Betty Moore Foundation, who funded some of the work from my lab on analysis of these genomes). And last but not least, thanks to the Director of JGI Eddy Rubin, supporting this project and for being patient with it and for kicking us in the pants when we needed to get moving on getting a paper out.

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N., Kunin, V., Goodwin, L., Wu, M., Tindall, B., Hooper, S., Pati, A., Lykidis, A., Spring, S., Anderson, I., D’haeseleer, P., Zemla, A., Singer, M., Lapidus, A., Nolan, M., Copeland, A., Han, C., Chen, F., Cheng, J., Lucas, S., Kerfeld, C., Lang, E., Gronow, S., Chain, P., Bruce, D., Rubin, E., Kyrpides, N., Klenk, H., & Eisen, J. (2009). A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea Nature, 462 (7276), 1056-1060 DOI: 10.1038/nature08656

Biologists rally to sequence ‘neglected’ microbes : Nature News

UPDATE: Our paper on this topic is out and there has been a bit of news here and there about it (e.g., NyTimes).  For more see

—————————-
Nice little story in Nature News about the need to sequence “neglected” microbes.

Biologists rally to sequence ‘neglected’ microbes : Nature News

Quotes me and a few others. Love the fact that it quotes Steven Giovannoni in support of this notion:

“The broad brush strokes of microbial diversity are not adequately represented in that first thousand,” says Stephen Giovannoni, a microbiologist at Oregon State University in Corvallis. “It’s absolutely important that we sequence more.”

I like this because Steve gave me enormous grief about this project at a conference last year. Though I argued with him and disagreed with him, his critiques helped guide much of our work on this project that helped make our paper on the work (which is in press) much better. Glad he generally is now in support of this type of project, though not sure what he thinks about our work in this area …

Here are some of my quotes:

“There’s no doubt to us that filling in the branches of the tree is going to be useful to lots of scientific studies that use genomic data,” says Eisen. “There have been four billion years of evolution and we can really benefit from having some of that information in our databases.”

All these new genomes should improve researchers’ understanding of the evolution, physiology and metabolic capacity of microbes, says Eisen. They will also help match DNA sequences to their proper species from large-scale, high-throughput metagenomic studies from environmental samples, and ultimately contribute in the fields of synthetic biology and genetic engineering.

Adopt a GEBA genome program for education – from the DOE/JGI

The DOE Joint Genome Institute’s Education Program is providing opportunities for colleges and universities across the country to “adopt” bacterial genomes, such as those sequenced as part of the “Genomic Encyclopedia of Bacteria and Archaea” (GEBA project), for analysis. This “Adopt a GEBA Genome” Education Program makes available a selection of recently sequenced genomes for use in undergraduate courses. The organisms ideally provide a unifying thread for concepts across the life sciences curriculum. For example, students can analyze the six open reading frames for a given fragment of DNA, compare the results of various gene calling algorithms, assign function by sequence homology, and use gene ortholog neighborhoods for comparative genomics and annotate biochemical pathways, while learning the underlying biological concepts in a variety of science courses.

For more information, and to apply for the November 2, 2009 deadline, see:
http://www.jgi.doe.gov/education/genomeannotation.html

For more on the GEBA project, which I am coordinating, see a video of a talk I gave about it at the JGI User meeting. Slides from that talk are on slideshare here.

Data fro the GEBA project is available at a dedicated IMG site here.

http://www.scivee.tv/flash/embedCast.swf

A much much much older talk, from when we just started the project is here:

A genomic encyclopedia of bacteria and archaea (video of my talk from the JGI user meeting)

My talk at the JGI meeting on “A genomic encyclopedia of bacteria and archaea” is now up at SciVee.  See below:

http://www.scivee.tv/flash/embedPlayer.swf

And the slides are up at Slideshare

More notes from Marco Island/ AGBT

Some notes on talks here:

My favorite talk yesterday morning was David Cox from Perlegen. He had as usual some good one liners including “Everybody and their mother is doing this so doing this is not so novel. What is novel about it is that it worked.” I should add that David Cox helped shape my career indirectly in many many ways. When I was a PhD student at Stanford, I got into genomics in part by teaching a course with David Botstein, Rick Myers and David Cox. When Craig Venter offered me a job at TIGR in 1998, I was not sure if moving to a non university was a good idea or not. So I asked many people for their opinions. Some said “You must do an academic post doc or you will never get a faculty job” I pretty much knew to ignore those folks. Cox gave the best advice. He said as long as I published things while at TIGR, it would not hurt me in any way. It probably would help. And so I took the job. And no doubt that was a great career move.

Other talks that were good were one by Joe Ecker, who discussed methylation in Arabidopsis and one by Andy Clark.

I skipped out on some of the lunch time to finish my talk for the PM session and also worked on my talk in the back of the room during the other PM talks. The PM session was on metagenomics and the most pleasing thing was that David Relman did not show up and he was replaced by Peter Turnbaugh from Jeffrey Gordon’s lab. Now – I wam not saying it was good that Relman was not there — he usually gives smashingly good talks. But Turnbaugh, a PhD student, stepped in as pinch hitter and gave a great talk on gut microbiome studies, really setting the stage for the whole session. I do not know if he was nervous stepping into a session like this but it did not show if he was. He certainly seemed relaxed when he said “Thanks to Dr. Relman for getting stuck in Chicago”

Forest Rowher gave a good talk on metagenomics and pointing out that viruses still get ignored in this field relative to their likely importance in communities. I have written about Forest before so I am going to discuss the other talks more … but if you have not heard him talk before try to find a way. He has a VERY different perspective on genomics and metagenomics than most of the people doing it. And he is dead right about the need to do more work on viruses.

Garth Ehrlich gave a talk on “bacterial plurality” and why he thinks gene content variation within communities of microbes in biofilms is important. His data certainly seemed solid and he showed some results that call into question the claims that some aspects of the “pangenome” hypothesis (he showed that the total number of genes in the Steptococcus strain collection does seem to level off after sequencing ~ 30 genomes and thus that the number of genes is not infinite as some people have suggested). So I liked some aspects of his talk. But he did make some evolution statements I found disagreeable (for those who care about the nitty gritty – he showed a cluster diagram of strain similarity and then used the position of strains within the cluster diagram to reflect relative branching order and historical patterns. A cluster diagram is a bad thing to use and one should use a phylogenetic tree for this. In addition he implied that one could make a genome-phylogeny from gene presence/absence information that would be more robust than a standard alignment phylogeny. This is not a reasonable thing
in my opinion — gene presence/absence patterns tend to end up grouping together unrelated lineages that have separately undergone gene loss. I just do not understand why people so badly want to not use alignments to build trees). Anyway – overall many of the things he said were interesting but I find certain non-evolution evolutionary analyses really grating.

Anyway – I was going to ask him a question after his talk about this, but then decided that, since I was talking next, getting into an argument with him just before my talk might seem lame. So I passed on the question. And then I gave my talk on the need to fill in the tree of life in terms of genome sequencing projects. I discussed a project we are just wrapping up that was part of the NSF “Tree of Life” program in which we sequenced genomes of eight bacteria that are from phyla that at the time had no genomes available. And then I talked about a new project I am coordinating at the Joint Genome Institute in which we are sequencing 100 genomes to really fill in some of the bacterial and archaeal tree. Next week I will post more about this project but I note – this is not done to study the tree of life per se. It is being done because if we have reference genomes from across the tree, all of our genome analyses of other systems and of metagenomes get better.

After dinner and some shell cllecting on the beach, there were evening talks and I went to the informatics session. Some of the talks there were good but the best thign I saw there was someone (I think Ben Blackburne) saying his slides were going to be on something called slideshare.net. I had never heard of this and checked it out and it seems pretty cool. I may use it in the future … but gotta go off to other things.