Misc. – Page 137 – Jonathan Eisen's Lab

Wanted – input on topics for "open access" publishing discussion at #scio10

To all

I am posting this because I will be chairing a session this up coming weekend at Science Online 2010 on “Open Access” publishing.

And I would love input from everyone/anyone out there on what might be worth discussion at this session. Possible topics include

why open and free are not the same thing
open access mandates
financial aspects of OA
educational uses of OA literature
things that are slowing the inevitable spread of OA publishing

I am perhaps most interested these days in the last two on that list. For example – it seems that OA publishing would spread even faster if we did not have some very conservative styles of tenure, promotion, hiring and grant review processes.

If anyone has some pressing topics that you think are worth bringing up in a discussion of OA publishing, please post them here.

Nice "Tree of life" video from Peabody Museum

I think I have written about this before but here goes again. There is a nice “Tree of Life” video from the Peabody Museum that is now on Youtube and also their web site that is definitely worth a look for people interested in phylogenetics and the tree of life. It includes Michael Donoghue, Scott Edwards, David Hillis, Tandy Warnow and Charles Davis.

#PLoSOne paper keywords revealing: (#Penis #Microbiome #Circumcision #HIV); press release misleading

UPDATE – READ COMMENTS – LEAD AUTHOR HAS GOTTEN PRESS RELEASE CHANGED

A new paper just showed up on PLoS One and it has some serious potential to be important The paper (PLoS ONE: The Effects of Circumcision on the Penis Microbiome) reports on analyses that show differences in the microbiota (which they call the microbiome – basically what bacterial species were present) in men before and after circumcision. And they found some significant differences. It is a nice study of a relatively poorly examined subject – the bacteria found on the penis w/ and w/o circumcision. This is a particularly important topic in light of other studies that have shown that circumcision may provide some protection against HIV infection.

In summary here is what they did – take samples from men before and after circumcision. Isolate DNA. Run PCR amplification reactions to amplify variable regions of rRNA genes from these samples. Then conduct 454 sequencing of these amplified products. And then analyze the sequences to look at the types and #s of different kinds of bacteria.

What they found is basically summarized in their last paragraph

“This study is the first molecular assessment of the bacterial diversity in the male genital mucosa. The observed decrease in anaerobic bacteria after circumcision may be related to the elimination of anoxic microenvironments under the foreskin. Detection of these anaerobic genera in other human infectious and inflammatory pathologies suggests that they may mediate genital mucosal inflammation or co-infections in the uncircumcised state. Hence, the decrease in these anaerobic bacteria after circumcision may complement the loss of the foreskin inner mucosa to reduce the number of activated Langerhans cells near the genital mucosal surface and possibly the risk of HIV acquisition in circumcised men.”

And this all sounds interesting and the work seems solid. I note that some friends / colleagues of mine were involved in this including Jacques Ravel who used to be at TIGR and now is at U MD and Paul Kiem who is associated with TGen in Arizona. For anyone interested in HIV, the human microbiome, circumcision, etc, it is probably worth looking at.

However, the press release I just saw from TGen really ticked me off. The title alone did me in “Study suggests why circumcised men are less likely to become infected with HIV”. Sure the study did suggest a possible explanation for why circumcised men are less likely to get HIV infections – the paper was justifiably VERY cautious about this inference. They basically state that there are some correlations worth following up.

The press release goes on to say “The study … could lead to new non-surgical HIV preventative strategies for the estimated 70 percent of men worldwide (more than 2 billion) who, because of religious or cultural beliefs, or logistic or financial barriers, are not likely to become circumcised.” Well sure, I guess you could say that. I think they are iplying you could change the microbiome somehow and therefore protect from HIV but that implies (1) that there really is a causal relationship between the microbial differences in HIV protection and (2) that one could change the microbiome easily, which is a big big stretch given how little we know right now.

Anyway – the science seems fine and not over-reaching. But the press release is annoying and misleading. Shocking I know. But this one got to me.

UPDATE – SEE COMMENTS HERE AND IN FRIENDFEED. LEAD AUTHOR GOT PRESS RELEASE CHANGED.

Price, L., Liu, C., Johnson, K., Aziz, M., Lau, M., Bowers, J., Ravel, J., Keim, P., Serwadda, D., Wawer, M., & Gray, R. (2010). The Effects of Circumcision on the Penis Microbiome PLoS ONE, 5 (1) DOI: 10.1371/journal.pone.0008422

Story behind the science: #PLoS Biology paper on cichlid vision evolution

I am continuing on a new theme here in trying to get author feedback on recent PLOS publications. Today I write about a recent paper on PLoS Biology “The Eyes Have It: Regulatory and Structural Changes Both Underlie Cichlid Visual Pigment Diversity” by Christopher Hofmann, Kelly O’Quin, N. Justin Marshall, Thomas Cronin, Ole Seehausen and Karen L. Carleton

This paper discusses “how changes in gene regulation and coding sequence contribute to sensory diversification in two replicate radiations of cichlid fishes.” A good overview of the paper is in an accompanying article “Visual Tuning May Boost African Cichlid Diversity” by Robin Meadows:

“African cichlid fish form new species faster than any other vertebrates, with hundreds of species evolving within the last 2 million years in Lake Malawi and within the last 120,000 years in Lake Victoria. This rapid speciation makes cichlids good models for elucidating the genetic mechanisms behind biodiversity. Vision may play a key role in cichlid evolution, adapting them to forage for new foods or colonize new habitats. Vertebrate retinas have two groups of light-sensitive proteins called opsins: those in rod photoreceptors, which are sensitive to dim light, and those in cone photoreceptors, which are sensitive to color. Changes in the visual system could be due to differences either in the expression of opsin genes or in their DNA sequences. A Research Article in this issue of PLoS Biology by Christopher Hofmann and colleagues suggests that both mechanisms underlie changes in visual sensitivity in cichlids.”

For more on the science, see her summary and see the article itself. Additional information can be found in the press release from U. MD

But what I wanted to cover here was some of the story behind the science. So I emailed the authors some questions which they were kind enough to answer and I post the details here. There are some really interesting tidbits in these answers in my opinion, including how they dealt with merging two papers into one, and how difficult (but fun) it is to do this field work in Lake Malawi.

1. What led you to do the study reported in the paper?

From Karen Carlton:

This study was a long time in the making. We started studying the visual system of cichlids in the 1990’s. We learned quickly that there was a lot of variation in opsin expression within the Lake Malawi species. However, we had only examined a few species. In 2005, Tom Cronin and Justin Marshall (world experts on aquatic visual systems) agree to come to Lake Malawi with us and help examine a greater number of species. Justin brought his underwater spectrometer and characterized the light environment. Tom and I measured fish colors (that paper is under review) and I extracted retina for quantifying gene expression.

Because Lake Malawi and Lake Victoria both contain large cichlid radiations and had such different light environments, Ole Seehausen and I started working together in 2000 to compare visual systems in Malawi and Victoria. (Ole is the world expert on Lake Victoria cichlids, having helped discover the large rock dwelling species flock that escaped the devastation of the Nile perch). We concentrated on opsin sequences in our previous publications. However, we wanted to look at gene expression as well.

I was fortunate in 2006 to move to U Maryland where Chris Hofmann and Kelly O’Quin joined in our efforts. Chris took on the Victoria cichlid gene expression based on samples that Ole had collected. Kelly became our statistical wizard and analyzed the Malawi data we had gathered. (He has also been working on the visual system of Tanganyikan cichlids, which are the ancestors of the Malawi and Victoria flock. This work has recently been submitted).

From Kelly:

I see Karen gave you a nice review of how this paper was started. As she said, the work was started before I joined her lab. At that time, we were primarily concerned with moving into the new lab at UMCP, so no one was actively working on the data set. I initially analyzed the data to practice for a similar study of Tanganyikan cichlids. But, as I learned more about the power (and pitfalls) of the comparative analysis, I became more and more involved with the actual analysis and discussions, and after about 6 months Karen asked me to write up the paper for the Lake Malawi data set. At the same time Chris was working on a manuscript for the Victorian data. After seeing the overlap in the two papers — really the similarities and differences — Karen and Chris and I decided it would be useful to put the two together.

2. How did this group come together, with people from Australia, Switzerland and Maryland?

From Karen:

Vision science is a small international community that is wonderfully supportive. The cichlid community is also small and makes for excellent collaborations. This is what makes research great – combining expertise from such a diverse group of people. This enables us to think across many disciplines from physics to biology and integrate light measurements, ecology, molecular biology and genetics to try and understand what drives cichlid visual communication and determine how it plays a role in speciation.

From Christopher

I would add that both Europe and Australia have some top people in the field of visual ecology. Also, I don’t think we could have had a paper with such a broad scope without our collaborators. Once we all got together things just kept building and was very exciting.

3. A question for Kelly — how do you feel about the “joint contribution” statement. Do you think there needs to be a system to truly list two first authors or do you think this statement will suffice?

From Karen

I feel like I should chime in here. We originally had written two separate papers with Chris as lead author on the Victoria data and Kelly on the Malawi data. However, we all felt a combined paper could be more powerful. I asked Kelly and Chris to combine these papers, though that was a very difficult thing to ask, particularly in these times of first author is best. However, this paper is truly the joint effort of these two as well as the rest of the authors and would not be the paper that it is without everyone’s contributions and perspectives.

From Kelly

It is nice to be recognized for the work and effort given, and presumably this is accomplished in the ‘Author Contributions’ statement as well as the order in which authors are listed in. For this paper, Chris and I each authored manuscripts that Chris had to painstakingly combine. After a lot of debate over the meaning and limits of our comparative results, we each wrote a new drafts of the combined study that Karen then resolved into a single manuscript. Tom, Justin, and Ole provided lots of comments and additional text throughout this process as well. This truly was a collaborative effort, with plenty of contribution and compromise on everyone’s part. Although the order in which the author’s are listed cannot possibly communicate all of the nuances involved (though I am certainly happy with the order given), I hope we were able to addressed them with the ‘joint contribution’ statement you mention, as well as our ‘Author contributions’ statement (which lists just about every author under each category).

In short, I don’t think a simple change to the way that we list authors will ever capture all of the individual and combined efforts that go into a study. Instead, I think we need to change the way we read and interpret that list.

4. How did you end up choosing PLoS Biology as a place to submit the paper to? Were there any debates among the group about publishing there?

From Karen:

Online journals, such as PLoS Biology, give us a lot of flexibility to include all the supporting data without limiting the length of the paper.

From Kelly:

Since we had essentially two large studies here, the generous space and supplemental information limits allowed by PLoS made it a natural choice to publish in.

5. Do you have any good stories about the field work?

From Karen:

Field work in Malawi is never dull. Getting there is the first problem. It is a 24 hr plane ride if all goes well (which it never does) plus a 5 hr drive down to the lake, partly on Malawi dirt roads. Once you get there, however, the lake is a beautiful place. The diving is about the best in the world and it is wonderful to immerse yourself in your organism’s habitat. Underwater, it is wall to wall fish, with 50 or more species in a single location so it is perfect for observing and collecting a wide diversity of species.

The field station is run by the University of Malawi. It is right next to Chembe village and the people there are incredibly warm and friendly. The research station has electricity and cold running water. This is very high living for the village and makes for an interesting dichotomy. Several of the villagers are experts on cichlid fish, including Richard Zatha, and they dive with us. They can catch fish far faster than we can. There is considerable wildlife including the baboons which like to come into the house and steal bread off the table. We were fortunate in not having to deal with hippos or crocodiles on either of our recent trips.

It is quite expensive to take a group to Malawi. However, it is essential for everyone to see their organism in its natural habitat. It also takes a lot of preparation as well to get a group of scuba divers certified and ready to do this kind of field work.

I’m sure Ole has comparable stories for his work in Lake Victoria.

From Christopher:

To build on what Karen said. Going to Lake Malawi and actually diving with the fish is an incredible experience. When we work in our aquaculture facility we have maybe a handful of fish from a few different species in a single tank. In the field, once you drop below the surface it is an entirely different world. There are literally hundreds if not thousands of fish from many different species all doing their own thing. Some are eating algae, others plankton and even other fish. Many of these species are ones that are impossible to keep or breed in captivity, which makes the challenges of getting there worthwhile.

From Kelly:

Not really other than to say that it is a lot of hard work. But if you like SCUBA diving in remarkably clear water with beautiful, colorful fish, I can’t think of a better place to work than Lake Malawi.

6. Can you provide links to web sites of the authors and or other links of interest such as videos of the fish, twitter pages, etc?

Our lab web site is: http://cichlid.umd.edu/cichlidlabs/kc/carletonlab.html

Justin Marshall: http://ilc00f.facbacs.uq.edu.au/VTHRC/ecovis/

Tom Cronin: http://www.umbc.edu/biosci/general/user/cronin

Ole’s lab: http://www.eawag.ch/organisation/abteilungen/fishec/index_EN

7. Anything else you want to add:

From Christopher:

I would also add that its not easy to catch fish in Malawi. There is a definite art to scuba diving and handling a net. Having local cichlid experts was invaluable.

———————–

Cichlid picture by Christopher Hofmann doi:10.1371/journal.pbio.1000267.g001

Meadows, R. (2009). Visual Tuning May Boost African Cichlid Diversity PLoS Biology, 7 (12) DOI: 10.1371/journal.pbio.1000267

Hofmann, C., O’Quin, K., Marshall, N., Cronin, T., Seehausen, O., & Carleton, K. (2009). The Eyes Have It: Regulatory and Structural Changes Both Underlie Cichlid Visual Pigment Diversity PLoS Biology, 7 (12) DOI: 10.1371/journal.pbio.1000266

Barcoding, taxonomy and citizen CSI

I just love the continued coverage of the story of the students from Trinity School in New York (a high school) who do investigative DNA barcoding projects. (There is a good new story about this on the LA Times blogs at:Think that sheep’s mik cheese comes from a sheep? DNA doesn’t lie | Booster Shots | Los Angeles Times)

In the most recent example, two students, Brenda Tan and Matt Cost, did some home barcoding in collaboration with people from the AMNH and Rockefeller University.

Among their findings:

“an invasive species of insect in a box of grapefruit from Texas”
“what could be a new species or subspecies of New York cockroach”
multiple mislabelled food products including (quoted from the press release, I note)

An expensive specialty “sheep’s milk” cheese made in fact from cow’s milk;
“Venison” dog treats made of beef;
“Sturgeon caviar” that was really Mississippi paddlefish;
A delicacy called “dried shark,” which proved to be freshwater Nile perch from Africa;
A label of “frozen yellow catfish” on walking catfish, an invasive species;
“Dried olidus” (smelt) that proved to be Japanese anchovy, an unrelated fish;
“Caribbean red snapper” that turned out to be Malabar blood snapper, a fish from Southeast Asia.

And what I find most interesting, is this built upon work of other students from Trinity Kate Stoeckle and Louisa Strauss who had done a restaurant based barcoding study last year.

This type of work is cool in so many ways. It gets students into science. It is an applied us of taxonomy (though I note, barcoding is not without controversy in the taxonomy community). It is a useful form of citizen science — and may eventually provide a way to keep dishonest sellers on their toes … Kudos to all involved in this

More coverage of the GEBA "Phylogeny Driven Genomic Encyclopedia"

Just a quick note here to post some links to additional stories about my new paper on “A phylogeny driven genomic encyclopedia of bacteria and archaea” which was published last week in Nature (with a Creative Commons license – which is rare in Nature but is what they use for genome sequencing papers).

Carl Zimmer has an article today in the New York Times “Scientists Start a Genomic Catalog of Earth’s Abundant Microbes” about the paper and the project. In the article he interviews me and Hans-Peter Klenk, who was a co-author and led the culturing part of the project. A few things to note about this:

It is rare to have archaea mentioned in the New York Times.
There is a tree that goes along with the article which is a modified version of the tree we had in our paper. I think theirs is very nice. Kudos to their artist
There is a quote by Norm Pace generally supportive of the project
The article mentions the JGI Adopt a Microbe program and even has a shout out to Malcolm Campbell at Davidson College and his recent PLoS One paper where they discuss results from a project where they took one of the genomes from our project and used it as part of a course on genome annotation/analysis.

For some of the story behind the paper see my recent blog post “Story Behind the Nature Paper on ‘A phylogeny driven genomic encyclopedia of bacteria & archaea’ #genomics #evolution“

Other discussions worth checking out

John Timmer’s article for Ars Technica on “Presenting a genomic encyclopedia of bacteria (and archaea)”
The Department of Energy is featuring the project as part of their “National Impact” Series” Scientists Launch the Genomic Encyclopedia of Bacteria and Archaea
NYTimes Science Times discussion from Charlie Petit at the Knight Science Journalism tracker

Also see

Archaea Make the Big Time from Genome Technology
Woodland Daily Democrat (local paper): Encyclopedia of microbe genomes released
Cory Golden at The Davis Enterprise wrote a nice story “Researchers urge new take on microbes” – not sure how long this stays online or how people access it though
Microbe World has a bit on it
Leonardo Martin has a really nice round up here
The ScientificBlogging staff have written a bit about it here
R&D mag Sr Editor Paul Livingstone has an interesting take on the story: Obsessive compulsive taxonomy
Green Car Congress with mostly material from the press releases here.
MyCor Web has a nice discussion of the paper

Wu, D., Hugenholtz, P., Mavromatis, K., Pukall, R., Dalin, E., Ivanova, N., Kunin, V., Goodwin, L., Wu, M., Tindall, B., Hooper, S., Pati, A., Lykidis, A., Spring, S., Anderson, I., D’haeseleer, P., Zemla, A., Singer, M., Lapidus, A., Nolan, M., Copeland, A., Han, C., Chen, F., Cheng, J., Lucas, S., Kerfeld, C., Lang, E., Gronow, S., Chain, P., Bruce, D., Rubin, E., Kyrpides, N., Klenk, H., & Eisen, J. (2009). A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea Nature, 462 (7276), 1056-1060 DOI: 10.1038/nature08656

Bakke, P., Carney, N., DeLoache, W., Gearing, M., Ingvorsen, K., Lotz, M., McNair, J., Penumetcha, P., Simpson, S., Voss, L., Win, M., Heyer, L., & Campbell, A. (2009). Evaluation of Three Automated Genome Annotations for Halorhabdus utahensis PLoS ONE, 4 (7) DOI: 10.1371/journal.pone.0006291

Story Behind the Nature Paper on ‘A phylogeny driven genomic encyclopedia of bacteria & archaea’ #genomics #evolution

Today is a fun day for me. A paper on which I am the senior author is being published in Nature (yes, the Academic Editor in Chief of PLoS Biology is publishing a paper in Nature, more on that below ..). This paper, entitled, “A phylogeny driven genomic encyclopedia of bacteria and archaea” represents a culmination of years of work by many people from multiple institutions. Today in this blog I am going to do my best to tell the story behind the paper – about the people and the process and a little bit about the science.

First, a brief bit about the science in the paper. In this paper, we (mostly people at the Joint Genome Institute, where I have an Adjunct Appointment — but also people in my lab at UC Davis and at the DSMZ culture collection) did a relatively simple thing – we started with the rRNA tree of life as a guide. Then we identified branches in the bacterial and archaeal portions of this tree where there were no genome sequences available (or in progress) (this was done mostly by Phil Hugenholtz, Dongying Wu and Nikos Kyrpides) Next we searched for representatives of these “unsequenced” branches in the DSMZ culture collection (a collection of bacteria and archaea that can be grown in the lab). And we identified in total some 200 of these. And then the DSMZ (under the direction of Hans-Peter Klenk) grew these organisms and sent the DNA to the Joint Genome Institute. And then JGI turned on their genome sequencing muscle and sequenced the genomes of the organisms in the DNA samples. And finally, we spent a good deal of time then analyzing the data asking a pretty simple question – are there any general benefits that come from this “phylogeny driven” approach to sequencing genomes compared to what one might find with sequencing just any random genome (after all, any genome sequence could have some value)? The paper, describes what we found, which is that there are in fact many benefits that come from sequencing genomes from branches in the tree for which genomes are not available.

More on the details of the science below. But first, I want to note that this paper was truly an amazing team effort, with all sorts of people from the JGI in particular, going above and beyond the call of duty to make sure it happened and worked well. And the Department of Energy has been truly phenomenal in my opinion in supporting this project which in the end is not explicitly about “energy” per se but is really about providing a reference set of genomes that should improve the value of all microbial genome data.

Anyway, now for the story behind the story. And be prepared, because this is a bit long. But I think it is important to place this work in a bigger context both in terms of my background as well as some of the background of other people in the project. If you can’t wait for more on the GEBA project then perhaps you should go to some of these links:

Videos of talks I have given on the project:

Podcast of interview of me for ASM’s Meet the scientist
Stories about GEBA

Nature News from 11.17.2009

Stories about our paper

Nature News
GenomeWeb “GEBA Researchers Publish Results from Dozens of Bacterial, Archaeal Genomes”
Ars Technica article “Presenting a genomic encyclopedia of bacteria (and archaea” by John Timmer
Iddo Friedberg blogged about it
The OpenHelix Blog on it
Leonardo Martins blogs about it here and helps translate a Spanish story about the project
R&D magazine has a post based on the press releases here
NY Times story by Carl Zimmer here.

FriendFeed Discussions here (includes a thread about Nature using a Creative Commons license)

And I will post more links as they come up. Below what I try to provide is some of the story behind the story:

My personal interest in applied uses of phylogenetics stage 1: undergraduate preparation at Harvard
As this paper is primarily about an applied use of phylogenetics (in selecting genomes for sequencing), I thought it would be worth going into some of how I personally became a bit obsessed with applied uses of phylogenetics. For me, my obsession began as an undergraduate at Harvard where I got exposed to the value of phylogeny as a tool from many many angles including but not limited to:

Freshman year taking a course taught by Stephen Jay Gould where Wayne and David Maddison were Teaching Assistant’s and where they were demoing their new phylogenetics software called MacClade
Sophomore year taking a conservation biology class with Eric Fajer and Scott Melvin where I was exposed to the concept of “phylogenetic diversity” as a tool in assessing conservation plans
Junior year working in the lab of Fakhri Bazzaz with people like David Ackerly and Peter Wayne who made use of phylogeny as a key tool in their research projects
Senior year and the year after graduating where I worked in the lab of Colleen Cavanaugh using rRNA based phylogenetic analysis to characterize uncultured chemosynthetic symbionts. I note it was in Colleen’s lab that I also became obsessed you could say with microbes and why they rock.

My personal interest in applied uses of phylogenetics stage 2: graduate school at Stanford

All of this and more gave me a strong passion for phylogeny as a tool. And so when I went to graduate school at Stanford (originally to work with Ward Watt on butterflies, but then I switched to working in Phil Hanawalt‘s lab on the “Evolution of DNA repair genes, proteins and processes“). And while in that lab I become pretty much obsessed with three things, all related to phylogeny.

First, I was interested in whether the rRNA tree of life, which I had used in my studies in Colleen Cavanaugh’s lab (and in my first paper in J. Bacteriology, which, thanks to ASM, is now in Pubmed Central and free at ASM’s site too), was robust or, as some critics argued, was not that useful. This was a critical question since the best way to study the phylogeny of microbes at the time, and also the best way to study uncultured microbes, was to leverage the ability to clone rRNA genes by PCR and then to build evolutionary trees of those rRNA genes. As part of my graduate work, I did a study where I compared the phylogenetic trees of rRNA to trees of another gene from the same species (I chose, recA). Surprisingly, despite the claims that the rRNA tree was not very useful and that different genes always gave different trees, if you compared the two trees ONLY where there was strong support for a particular branching pattern, the trees of the two genes were in fact VERY VERY similar (a finding that had been suggested previously by others, including Lloyd and Sharp)

Second, I also became obsessed with the fact that most of the experimental studies of DNA repair processes were in a very narrow sampling of the phylogenetic diversity of organisms (e.g., at the time, no studies had been done in archaea, and most studies in bacteria were from only two of the many major groups). So I started experimental studies of repair in halophilic archaea in order to help broaden the diversity of studies. And I began to use PCR to try and clone out repair genes from various other species of diverse bacteria and archaea. Alas, as I was doing this, some institute called TIGR was sequencing the complete genomes of organisms I was trying to clone out single genes from. In fact, one of the first organisms I was working on for PCR studies was Archaeoglobus fulgidus. And when I found out TIGR was sequencing the genome, in a project led by non other than the great microbial evolutionary biologist Hans-Peter Klenk (yes, the same one who helped us in this GEBA project). I decided it was silly to try to clone out individual genes by PCR. And thus I began to learn how to analyze genomes.

It was in the course of learning how to analyze genomes that I came up with another applied use of phylogeny. I realized that one should be able to use phylogenetic studies of genes to help in predicting functions for uncharacterized genes as part of genome annotation efforts. And so I wrote a series of papers showing that this in fact worked (I did this first for the SNF2 family of proteins and then alas coined my own omics word “phylogenomics” to describe this integration of genome analysis and phylogenetics and formalized this phylogenomic approach to functional prediction). I note that what I was arguing for was that protein function could be treated like ANY other character trait and one could use character trait reconstruction methods (which I had learned about while playing with that MacClade program) to infer protein functions for unknown proteins in a protein tree. I note that this notion of predicting protein function from a protein tree is completely analogous to (and one could rightfully say stolen from) how researchers studying uncultured microbes were trying to infer properties of microbes from the position of their rRNA genes in the rRNA tree of life (as I had learned in studies of symbioses).

My personal interest in applied uses of phylogenetics stage 3: my plans for a post doc

So as I was wrapping up graduate school I was seeking a way to go beyond what I was doing and combine studies of DNA repair and evolution and microbiology in another way. And I thought I had found a perfect one in a post doc I accepted with A. John Clark at U. C. Berkeley. John was the person who had discovered recA, the gene I had been using for phylogenetic analysis and for structure function studies. And he was working with none other than Norm Pace and a young hotshot in Norm’s lab, Phil Hugenholtz (as well as a few others including Steve Sandler) in trying to use the recA homolog in archaea as a marker for environmental studies of archaea . It sounded literally perfect. And so I was excited to start this job. That was, until I met Craig Venter.

Grabbing the TIGR by the tail

While I had been playing around with data from TIGR in the latter years of my time in graduate school, I also got involved in teaching a fascinating class with David Botstein, Rick Myers, David Cox and others. (As an aside, this class was part of a new initiative I helped design at Stanford on “Science, Math and Engineering” for non science majors – an initiative that was a pet project of non other than Condie Rice who was Provost at the time). Anyway, Rick Myers was serving as a host for one Craig Venter when he came and gave a talk at Stanford and somehow I managed to finagle my way into being invited to go out to dinner with Craig. And at dinner, I proceeded to tell Craig that I thought some of the evolution stuff he was talking about was bogus and I tried to explain some of my work on phylogeny and phylogenomics. Not sure what Craig thought of the cocky PhD student drawing evolutionary trees on napkins, but it eventually got me a faculty job at TIGR and I worked extensively with Craig so it must have been worth something. And so I and my fiancé Maria-Inés Benito (now wife …) moved to Maryland and spent eight great years there (my wife started in MD as a faculty member at TIGR too, but then she left to go to a company called Informax, may it rest in peace).

Most of my work at TIGR focused on a different side of phylogenomics than represented in the GEBA project. At TIGR I focused on the uses of evolutionary analysis as a component to analyzing genomes – from predicting gene function to finding duplications (e.g., see the V. cholerae genome paper) to identifying genes under unusual patterns of mutation or selection to finding organelle derived genes in nuclear genomes (e.g., see this) to studying the occurrence of lateral gene transfer or the lack of occurrence of it to studying genome rearrangement processes.. And sure, every once in a while I worked on a project where the organism was the first in its major branch to have a genome sequenced (e.g., Chlorobi). And I had noted, along with others that there was a big phylogenetic bias in genome sequencing project (e.g., see my 2000 review paper discussing this here).

But that did not really drive my thinking about what genome to actually sequence until TIGR hired a brilliant microbial systematics expert Naomi Ward as a new faculty member. And it was Naomi who kept emphasizing that someone should go about targeting the “undersequenced” groups in the Tree of Life.

NSF Assembling the Tree of Life grant
And so Naomi and I (w/ Karen Nelson and Frank Robb) put together a grant for the NSF’s “Assembling the Tree of Life” program to do just this – to sequence the first genomes from eight phyla of bacteria for which no genomes were available but for which there were cultured organisms. Amazingly we got the grant. And we did some pretty cool things on that project, including sequencing some interesting genomes, and developing some useful new tools for analyzing genomes (e.g., STAP, AMPHORA, APIS). And I was able to hire some amazing scientists to work in my lab on the project including Dongying Wu (the lead author on the GEBA paper) and Martin Wu (who also worked on the GEBA project and is now a Prof. at U. Virginia) and Jonathan Badger. Alas, we did not publish any earth shattering papers as part of this NSF Tree of Life project on analyzing the genomes of these eight organisms, not because there was not some interesting stuff there but for some other reasons. First, I moved to UC Davis and there was a complicated administrative nightmare in transferring money and getting things up and running at Davis on this project so my lab was not really able to work on it for two years (in retrospect, what a f*ING nightmare dealing with the UC Davis grants system was …).

Then, just as things we ready to get restarted, TIGR kind of imploded and many of the people, including Naomi, my CoPI, left (though I note, my moving to Davis was unrelated to the dissolution of TIGR). But perhaps most importantly, there were some actual technical and scientific problems with our dreams of changing the world of microbiology from our phyla sampling project – the science was not quite there. In particular, having a single genome from each of these phyla was simply not enough to get (and show) the benefits that can come from improved sampling of the tree of life. And thus though we have published some cool papers from this project (e.g., see this PLoS One paper on one of the genomes), we all ended up in one way or another, disappointed with the final results.

Davis and JGI: the return of phylogeny to genomic sampling

When I moved to UC Davis I also was offered (and accepted) an Adjunct Appointment at the Joint Genome Institute (JGI). At both places, I envisioned reinventing myself as someone who worked on studying microbes directly in the environment (e.g., with metagenomics) and symbioses (both of which I had started on at TIGR). And in fact, in a way, I have done this, since I got some medium to big grants to work on these issues. I tried diligently to attend weekly meetings at the JGI but it was difficult since technically I was 100% time at UC Davis and was in essence supposed to be at 0% time at JGI. And when JGI hired Phil Hugenholtz to run their environmental genomics/metagenomics work, I was needed less at JGI since, well, Phil was so good. It was great to go over there and interact with Eddy Rubin, Phil Hugenholtz, and Nikos Kyrpides, among others, but it was unclear what exactly I would do there with Phil running the metagenomics show.

And then, like magic, something came up. I went to one of the monthly senior staff meetings at JGI and while we were listening to someone on the speaker phone, Eddy Rubin handed me a note asking me what I thought about the proposal someone was making to sequence all the species in the Bergey’s Manual. And the light bulb of phylogeny went back on in my head. I told him (I think I wrote it down, but may have said out loud), something like “well, sequencing all 6000 or so species would be great, but it would be better to focus on the most phylogenetically novel ones first.” And in a way, GEBA was born. Eddy organized some meetings at JGI to discuss the Bergey’s proposal and I argued for a more phylogeny driven approach. And this is where having Phil Hugenholtz and Nikos Kyrpides at JGI was like a perfect storm. You see, both had been lamenting the limited phylogenetic coverage of genomes for years, just like I had. Phil had even written a paper about it in 2002 which we used as part of our NSF Tree of Life proposal. And Nikos too had been diligently working for years to make sure novel organisms were sequenced. So though we went to a meeting to discuss the Bergey’s manual idea, we instead proposed an alternative – GEBA.

And for some months, we pitched this notion to various people including at JGI, DOE, and various advisory boards. And the response was basically – “OK – sounds like it COULD be worth doing – why don’t you do a pilot and TEST if it is worth doing” And so, with support from Eddy Rubin and DOE, that is what we did.

One key limitation – getting DNA

So Phil, Nikos and I and a variety of others starting working on the general plan behind GEBA. But there was one key limitation. How were we going to get DNA from all these organisms? One possibility was to seek out diverse people in the community and have them somehow help us. This had some serious problems associated with it, not the least of which was the worry that we might end up sequencing varieties of organisms that people had in their lab but which nobody else had access to (something Naomi Ward and I had written about as a problem a few years before).

And here came the second perfect storm – none other than Hans-Peter Klenk (yes, the same one who had led some of the early genome sequencing efforts when he was at TIGR), was visiting JGI. And he had a relatively new job – at the German Culture Collection DSMZ (In fact, I should note, I had tried to get a job at TIGR even before I met Venter, since they had a position advertised for a “microbial evolutionary biologist” — but that job went to Klenk). Phil Hugenholtz had asked the Head of DSMZ, Erko Stackebrandt, if they might be interested in helping us grow strains and get DNA but we did not yet have a full collaboration with them. And Erko had suggested we contact Hans-Peter. And in his visit to JGI it became apparent that he would do whatever he could to help us build a collaboration with DSMZ. And thus we had a source of DNA. Even more amazingly to me, they did it all for free.

GEBA begins

And thus began the real work in the project. Phil used his expertise with rRNA databases, especially GreenGenes, to pull out phylogenetic trees of different groups. And Nikos used his expertise as the curator of a database on microbial sequencing projects (called GenomesOnline) to help tag which branches in Phil’s tree had sequenced genomes or ones in progress. And then they looked for whether any of the members of the unsequenced branches had representatives in the DSMZ collection. And with some help from Dongying Wu and me, we came up with a list. And with the help of the JGI “Project Management” team including David Bruce and Lynne Goodwin and Eileen Dalin and others at JGI we developed a protocol for collaborating with DSMZ and getting DNA from them.

And I became the chief cheerleader and administrator of the project, in part since Phil and Nikos were so busy with their other things at JGI. And though I was not always on the ball, the project moved forward and we started to get genomes sequenced using the full strength of the JGI as a genome center. The finishing teams at JGI worked diligently on finishing as many of the genomes as possible. And Nikos’ team at JGI made sure that the genomes were annotated. And I helped make sure that they data release policies were broadly open (which everyone at JGI supported). And after many false starts with papers on the project that were way way way to cumbersome and big, with some kicks in the pants from the director of JGI Eddy Rubin who was getting anxious about the project, we turned out the GEBA paper that was published today in Nature.

You might ask, why, as a PLoS official and PLoS cheerleader, we ended up having a paper in Nature? Well, in the end, though I am senior author on the paper, the total contribution to the work mostly came from people at JGI who did not work for me but instead worked with me on this great project. And we took some votes and had some discussions and in the end, despite my lobbying to send it to PLoS Biology, submitting it to Nature was the group decision. I supported this decision in part due to the fact that Nature uses a Creative Commons license for genome papers. But I also supported it because in the end, this was a collaboration involving many many many people and in such projects everyone has to compromise here and there. Now mind you, I am not sad to have a paper in Nature. But I would personally have preferred to have it in a journal that was fully open access, not just occasionally open like Nature.

Now I note, there were a million other things that went on associated with the GEBA project. Some of which I was not even involved in in any way. I will try to write about some of these another time, but this post is already way way way too long. So I am going to just stop here and add that I have been honored and lucky work with people like Phil, Nikos, Hans-Peter, and others on this project and to have the people at the JGI work so hard on the background parts of this project. Thanks to all of them and to the people at DSMZ and in my lab who helped out and to the DOE for funding this work (as well as the Gordon and Betty Moore Foundation, who funded some of the work from my lab on analysis of these genomes). And last but not least, thanks to the Director of JGI Eddy Rubin, supporting this project and for being patient with it and for kicking us in the pants when we needed to get moving on getting a paper out.

Story behind the story for new #PLoSOne paper on Bayesian phylogenetics

There is an interesting new paper in PLoS One” Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics” by Brian Kolaczkowski and Joseph Thornton. The work focuses on methods for inferring phylogenetic history and in particular two types of statistical approaches: Likelihood and Bayesian. These methods are related to each other in that both attempt to use statistical models of evolution and then test different possible phylogenetic trees related taxa by how well certain data sets about those taxa map into the different possible trees. What they did in this new paper was test, with some simulations, and with some mathematical analyses. And somewhat surprisingly, they find that Bayesian methods, which have become more popular recently, appear to be more prone to errors than likelihood methods, when the data sets have multiple not closely related taxa with long branches. (Note if you want to learn more about phylogenetic methods, you can look at the online chapter (html format or PDF) from my Evolution Textbook, though I confess this needs a bit of revision, which I am working on now).

What they see in these cases is that the taxa with long branches group together, something known generally as “Long Branch Attraction” (LBA). Though there have been many previous studies of LBA, most have ended up showing that statistical methods are less prone to this problem than other phylogenetic methods, like distance and parsimony methods. What is surprising in this new work in that they find that Bayesian methods are highly prone to LBA – and much more so than likelihood methods.

Anyway, for more on this one could read the paper. But that I thought might be interesting is to ask the authors for more detail directly. I am hoping to do this more and more with PLoS papers in the future. I was inspired to do this, in fact, by one of the authors of this paper, Joe Thornton. He sent me an email with a link to the paper saying he thought I might be interested in it (true) and that he felt that it was his job in part for a PLoS One paper to make sure it got read by the right audience so he was hoping I might blog about it. And I said sure, but only if he gave me some of the “story behind the story”. So here it is below:

Why did you do these experiments?

Why did we do these experiments? A few years ago, we were studying the behavior of Bayesian posterior probabilities on clades — whether or not they accurately predict the probability that a clade is true, and what kinds of conditions might cause them to deviate from this ideal. We found that when the true tree was in the Felsenstein zone (two non-sister long branches separated by short branches), the long branches were often incorrectly grouped together with strong support. This was just a small part of a much larger paper that was published in MBE in 2008. The suggestion that Bayesian inference (BI) might be biased in favor of a false tree was surprising and intriguing, because we — like most people in the field — had assumed that BI would have the desirable statistical properties of ML (e.g., nearly unbiased inference and statistical consistency — convergence on the true tree with increasing support as the amount of data grows and the evolutionary model is correct, etc.). So we began doing experiments to rigorously explore the nature of the bias and its causes. When we found that BI was statistically inconsistent and the cause was integrating over branch lengths, we knew this result would be controversial, so we wanted to be sure the experiments were truly airtight. We supplemented our initial simulations with analyses of empirical data, with simulations under a wide variety of conditions using all types of priors, as well as mathematical and numerical analyses to clearly demonstrate the reasons for the bias. We also developed software that was identical to fully Bayesian MCMC except that it does not integrate over branch lengths; this method is not subject to the bias that BI displays, clearly demonstrating the cause of the bias.

Why did you send this to PLoS One?

Why did we submit to PLoS One? We think this paper has profound implications for phylogenetic practice and theory, and we want it to have a wide audience. Our experience with the review process in phylogenetic methods, unfortunately, is that many reviewers evaluate manuscripts based on whether or not the results confirm their world-view. This is a legacy of decades of internecine warfare in the field between the adherents of different methodological camps. We write papers in other fields, and while peer-review always has its ups and downs, our experience in phylogenetics is unusual in that solid papers are often rejected for philosophical reasons rather than for reasons of scientific validity and quality. We know this paper will be controversial, and we didn’t want it to be shot down in the review process for partisan reasons. PLoS One seemed like the perfect place to get the paper out and let the scientific community evaluate whether the experiments are convincing or not.

This is our first time publishing in PLoS One. I confess to being a little bit anxious that the paper will be lost in the great tide of papers published in the journal. We know our paper is very strong — I think it’s perhaps the most convincing and complete analysis of any problem I’ve ever published — so we’re confident that the work can have an impact, as long as the attention of readers in the field is drawn to it.

Where is the other author these days?

Bryan is now a postdoc in Andy Kern’s lab at Dartmouth.

Kolaczkowski, B., & Thornton, J. (2009). Long-Branch Attraction Bias and Inconsistency in Bayesian Phylogenetics PLoS ONE, 4 (12) DOI: 10.1371/journal.pone.0007891

Creative Commons Licenses adopted at Palo Alto High School

Cool – Creative Commons spreading even to Palo Alto High School – See Paly Voice – Creative Commons Spotlight. According to the article, multiple Palo Alto High publications have adopted CC licenses and are the first high school publications to do so. Good call I say. Plus check out the article which discusses other diverse uses of CC including Nine Inch Nails, PLoS, Wikipedia, and others. Of course, this might have something to do with Lawrence Lessig being from the neighborhood, but that’s OK by me.

US government seeks input on Open Access policies

Quick one here. For all interested in Open Access. Below are some excerpts from an email I received from the folks at PLoS Computational Biology. The main point: the White House Office of Science and Technology Policy is seeking input on broadening public access to publically funded research …

The White House Office of Science and Technology Policy has recently invited comment on broadening public access to publicly funded research and they want to hear from you. Contributions may be posted to their blog at: http://blog.ostp.gov/2009/12/10/policy-forum-on-public-access-to-federally-funded-research-implementation/

Their Request for Information (RFI) lasts for just 30 days and expires on 7 January 2010, so we’d like to inform you about this important effort and encourage you to get involved in the discussion. This is an opportunity for us to shape a broader public access policy – how it should be implemented, what type of technology and features are needed, and how to manage it.

There are 3 main topics where the administration would appreciate your input (they also welcome general comments) and each one is open for a set period of time:

1. Implementation – expires 20 December 2009 (i.e. on Sunday). Which Federal agencies are good candidates to adopt Public Access policies? What variables (field of science, proportion of research funded by public or private entities, etc.) should affect how public access is implemented at various agencies, including the maximum length of time between publication and public release?

2. Features and Technology – 21-31 December 2009. In what format should the data be submitted in order to make it easy to search and retrieve information, and to make it easy for others to link to it? Are there existing digital standards for archiving and interoperability to maximize public benefit? How are these anticipated to change?

3. Management – 1-7 January 2010. What are the best mechanisms to ensure compliance? What would be the best metrics of success? What are the best examples of usability in the private sector (both domestic and international)? Should those who access papers be given the opportunity to comment or provide feedback?

Hat tip to Karla Heidelberg, Carl Beottiger, and many others who emailed me about this to suggest I post something …

Related things worth looking at:

Federal register announcement about this
Slashdot story on this topic
Alliance for Taxpayer Access
Peter Suber’s Open Access News on the topic
Bora on the topic

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: