My first PLoS One paper …. yay: automated phylogenetic tree based rRNA analysis
Well, I have truly entered the modern world. My first PLoS One paper has just come out. It is entitled “An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP)” and well, it describes automated software for analyzing rRNA sequences that are generated as part of microbial diversity studies. The main goal behind this was to keep up with the massive amounts of rRNA sequences we and others could generate in the lab and to develop a tool that would remove the need for “manual” work in analyzing rRNAs.

The work was done primarily by Dongying Wu, a Project Scientist in my lab with assistance from a Amber Hartman, who is a PhD student in my lab. Naomi Ward, who was on the faculty at TIGR and is now at Wyoming, and I helped guide the development and testing of the software.

We first developed this pipeline/software in conjunction with analyzing the rRNA sequences that were part of the Sargasso Sea metagenome and results from the word was in the Venter et al. Sargasso paper. We then used the pipeline and continued to refine it as part of a variety of studies including a paper by Kevin Penn et al on coral associated microbes. Kevin was working as a technician for me and Naomi and is now a PhD student at Scripps Institute of Oceanography. We also had some input from various scientists we were working with on rRNA analyses, especially Jen Hughes Martiny

We made a series of further refinements and worked with people like Saul Kravitz from the Venter Institute and the CAMERA metagenomics database to make sure that the software could be run outside of my lab. And then we finally got around to writing up a paper …. and now it is out.

You can download the software here. The basics of the software are summarized below: (see flow chart too).

  • Stage 1: Domain Analysis
    • Take a rRNA sequence
    • blast it against a database of representative rRNAs from all lines of life
    • use the blast results to help choose sequences to use to make a multiple sequence alignment
    • infer a phylogenetic tree from the alignment
    • assign the sequence to a domain of life (bacteria, archaea, eukaryotes)

  • Stage 2: First pass alignment and tree within domain
    • take the same rRNA sequence
    • blast against a database of rRNAs from within the domain of interest
    • use the blast results to help choose sequences for a multiple alignment
    • infer a phylogenetic tree from the alignment
    • assign the sequence to a taxonomic group

  • Stage 3: Second pass alignment and tree within domain
    • extract sequences from members of the putative taxonomic group (as well as some others to balance the diversity)
    • make a multiple sequence alignment
    • infer a phylogenetic tree

From the above path, we end up with an alignment, which is useful for things such as counting number of species in a sample as well as a tree which is useful for determining what types of organisms are in the sample.

I note – the key is that it is completely automated and can be run on a single machine or a cluster and produces comparable results to manual methods. In the long run we plan to connect this to other software and other labs develop to build a metagenomics and microbial diversity workflow that will help in the processing of massive amounts of sequence data for microbial diversity studies.

I should note this work was supported primarily by a National Science Foundation grant to me and Naomi Ward as part of their “Assembling the Tree of Life” Program (Grant No. 0228651). Some final work on the project was funded by the Gordon and Betty Moore Foundation through grant #1660 to Jonathan Eisen and the CAMERA grant to UCSD.

Wu, D., Hartman, A., Ward, N., & Eisen, J. (2008). An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) PLoS ONE, 3 (7) DOI: 10.1371/journal.pone.0002566

Tackling the hairy beast – Tetrahymena genome

Just thought I would put out a little self-promotional posting here on a paper we have published today on the genome of a very interesting organism called Tetrahymena thermophila. This organism is a single-celled eukaryote that lives in fresh water ponds.

This species has served as a powerful model organism for studies of the workings of eukaryotic cells. Studies of this species have led to some fundamental discoveries about how life works. For example, telomerase, the enzyme that helps keep the ends of linear chromsomes from degrading, was discovered in this species. This may not seem too important, but many folks think that degradation of chromosome ends in humans is involved in aging. Perhaps even more importantly, (to me at least) studies of this species were fundamental to the discovery that RNA can be an enzyme. This discovery of catalytic RNA revolutionized our understanding of how cells work and how life evolved. Tom Cech and Sidney Altman were given the Nobel Prize in 1989 for this discovery.

Many (including myself) believe that having the genome sequence of this species will further spur research and its use as a model organism. In addition, we believe that some of the findings we report in our paper will further cement the importace of this species. For example, this species, though single celed, encodes nearly as many proteins as humans and possesses many processes and pathways shared with animals but missing from other model single celled species.

The project that led to this publication was undertaken while I was at TIGR (The Institute for Genomic Research) and involved a collaboration among people at dozens of research institutions around the world. It all started in 2001 when Ed Orias and his colleagues sought to see if anyone at TIGR would be interested in putting in a grant to sequence this species’ genome. I responded to the email saying I was interested, especially since I had interacted with multiple people who used this species as a model system (e.g., Laura Landweber at Princeton and Laura Katz at Smith). So I went to a FASEB meeting where the Tetrahymena Genome Steering Committee was meeting and discussed with them how TIGR might help sequence the genome. And after talking to other genome centers, they selected TIGR to put in a grant proposal with them.

We ended up getting funding from two grant proposals – one from NIGMS and the other from the NSF Microbial Genome Sequencing Program. The sequencing was done in a rapid burst at the new Joint Technology Center which TIGR shares with the Venter Institute. And then we spent ~1.5 years analyzing the sequence data (and assemblies) that came out and in the end we fortunately were able to get our paper into PLoS Biology, in my opinion the best place available to publish biology research.

Importantly PLoS Biology is Open Access which allows anyone anywhere to read about our work. This goes well with the free and open release we made of the genome sequence data. In fact, many people published papers on the genome before we did (sometimes scooping us). In the end, I accepted the risks of releasing the genome data with no restrictions inexchange for advancing research on this organisms. I think this risk was well worth it as we still got our big paper published and the field has advanced more rapidly than if we had not released the data.

Other links that may be of interest to people:

Eisen, J., Coyne, R., Wu, M., Wu, D., Thiagarajan, M., Wortman, J., Badger, J., Ren, Q., Amedeo, P., Jones, K., Tallon, L., Delcher, A., Salzberg, S., Silva, J., Haas, B., Majoros, W., Farzad, M., Carlton, J., Smith, R., Garg, J., Pearlman, R., Karrer, K., Sun, L., Manning, G., Elde, N., Turkewitz, A., Asai, D., Wilkes, D., Wang, Y., Cai, H., Collins, K., Stewart, B., Lee, S., Wilamowska, K., Weinberg, Z., Ruzzo, W., Wloga, D., Gaertig, J., Frankel, J., Tsao, C., Gorovsky, M., Keeling, P., Waller, R., Patron, N., Cherry, J., Stover, N., Krieger, C., del Toro, C., Ryder, H., Williamson, S., Barbeau, R., Hamilton, E., & Orias, E. (2006). Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote PLoS Biology, 4 (9) DOI: 10.1371/journal.pbio.0040286

Glassy winged sharpshooter symbionts

For those interested in wine production, or symbioses, you maybe interested in a paper we published a few days ago. It was on a study we did of bacterial symbionts of an insect known as the glassy winged sharpshooter. This insect is a vector for Pierce’s Disease in grapes – a nasty disease that if it is found in a vineyard might lead to the vineyard being sacrificed for the greater good.

Anyway – we did a study of bacteria that live inside the insect that make nutrients for their insect host and without whcih the insect will die. An understanding of these symbionts will hopefully lead to better methods to control the spread of this invasive insect.

Our paper can be found in PLoS Biology here.
A synopsis of our article is here.
An article in Science Now about our study is here
An article in the Central Valley Business Times is here.
Nature highlighted it in their “Research highlights” section
And ASM article about this here
Link to our collaborator’s lab (Nancy Moran)

Some new links about our paper

For more information about the sharpshooter and Pierce’s Disease seee the following links

  • Pierce’s Disease Control Program for State of California: here
  • Glassy winged sharpshooter media information here
  • Introduction to Pierce’s Disease here

Wu, D., Daugherty, S., Van Aken, S., Pai, G., Watkins, K., Khouri, H., Tallon, L., Zaborsky, J., Dunbar, H., Tran, P., Moran, N., & Eisen, J. (2006). Metabolic Complementarity and Genomics of the Dual Bacterial Symbiosis of Sharpshooters PLoS Biology, 4 (6) DOI: 10.1371/journal.pbio.0040188