Great paper showing the potential power of comparative and evolutionary genomics in #PLoS Genetics

There is a wonderful paper that has just appeared in PLoS Genetics I want to call people’s attention to: PLoS Genetics: Emergence and Modular Evolution of a Novel Motility Machinery in Bacteria

In the paper, researchers from CNRS and Aix-Marseille in France used some nice comparative and evolutionary genomics analyses along with experimental work to characterize the function and evolution of gliding motility in bacteria.

Their summary of their work:

Motility over solid surfaces (gliding) is an important bacterial mechanism that allows complex social behaviours and pathogenesis. Conflicting models have been suggested to explain this locomotion in the deltaproteobacterium Myxococcus xanthus: propulsion by polymer secretion at the rear of the cells as opposed to energized nano-machines distributed along the cell body. However, in absence of characterized molecular machinery, the exact mechanism of gliding could not be resolved despite several decades of research. In this study, using a combination of experimental and computational approaches, we showed for the first time that the motility machinery is composed of large macromolecular assemblies periodically distributed along the cell envelope. Furthermore, the data suggest that the motility machinery derived from an ancient gene cluster also found in several non-gliding bacterial lineages. Intriguingly, we find that most of the components of the gliding machinery are closely related to a sporulation system, suggesting unsuspected links between these two apparently distinct biological processes. Our findings now pave the way for the first molecular studies of a long mysterious motility mechanism.

Basically, they started with some genetic and functional studies in Myxococcus xanthus.  They analyzed these in the context of the genome sequence (note – I was a co-author on the original genome paper).  And then they did some extensive comparative and evolutionary analysis of these genes, producing some wonderful figures along the way such as:

Figure 2. Taxonomic distribution of the closest homologues of the 14 genes composing the G1, G2, and M1 clusters, and genetic organization of the core complex. (A) For a given gene, the number of homologues in the corresponding genome is indicated by the numbers within arrows. The relationships between the species carrying the different homologues of the genes are indicated by the phylogeny on the left. Based on their taxonomic distribution, the 14 genes can be divided into Group A (grey background) and Group B (white background). (B) In all non Deltaproteobacteria and in Geobacter, the Group B genes clustered in a single genomic region.  doi:10.1371/journal.pgen.1002268.g002  


Based on their analysis they then came up with some hypotheses as to which genes were involved in key parts of gliding motility and what their biochemical functions were and they then went and confirmed this with experiments.  I am not going to go into detail on the functional work they did but you can read their paper for more details.

They wrapped up their paper by proposing an model for the evolutionary history of gliding motility.  I am not sure I buy all components of their model since our sampling of genomes right now is still very poor, but they have a pretty detailed theory captured in part in this figure:

Figure 8. Evolution and structure of the Myxococcus gliding motility machinery. A) Evolutionary scenario describing the emergence and evolution of the gliding motility machinery in M. xanthus. The relationships between organisms carrying close homologues of the 14 genes encoding putative components of the gliding machinery in M. xanthus are represented by the phylogeny. Green and red arrows respectively indicate gene acquisition and gene loss. The number of gene copies that were acquired or lost is indicated within arrows. The purple dotted arrows represent horizontal gene transfer events of one or several components. WGD marks the putative whole genome duplication event that occurred in the ancestor of Myxococcales. For each gene, locus_tag, former (agm/agl/agn) and new (glt and agl) names are provided. The number of complete genomes that contain homologues of glt and agl genes compared to the total number of complete genomes available at the beginning of this study are indicated in brackets. (B) The Myxococcus gliding machinery. The diagram compiles data from this work and published literature. Components were added based on bioinformatic predictions, mutagenesis, interaction and localization studies. Exhaustive information is not available for all proteins and thus the diagram largely is subject to modifications once more data will be available. Known interactions within the complex from experimental evidence are AglR-GltG, AglZ-MglA and interactions within the AglRQS molecular motor [13], [15]. For clarity, the proteins were colour-coded as in the rest of the manuscript 

Anyway – I don’t have much time right now to provide more detail on the paper.  But it is definitely worth checking out.

Storification of my notes/tweets from #UCDavis CLIMB Symposium "The infant gut microbiome: prebiotics, probiotics and establishment"

I made a Storify posting for the CLIMB Symposium I participated in yesterday. First I am reposting my summary of what the symposium was about which I posted the day before the meeting:

There is a symposium tomorrow at UC Davis organized by a undergraduates in the CLIMB program.  CLIMB stands for “Collaborative Learning at the Interface of Mathematics and Biology (CLIMB)” and is a program that emphasizes hands-on training using mathematics and computation to answer state-of-the-art questions in biology.  A select group of undergraduates participate in the program and this summer the students had to do some sort of modelling project.  Somehow I managed to convince them to do work on human gut microbes.  And they have done a remarkable job.  

As part of their summer work, they organized a symposium on the topic and their symposium takes place tomorrow.  Details are below. 

The Infant Gut Microbiome: Prebiotics, Probiotics, & Establishment 

  • Jonathan Eisen, UC Davis “DNA and the hidden world of microbes”
  • Mark Underwood, UC Davis “Dysbiosis and necrotizing enterocolitis”
  • Ruth Ley, Cornell University “Host-microbial interactions and metabolic syndrome” 
  • CLIMB 2010 cohort “Breast milk metabolism and bacterial coexistence in the infant microbiome”
  • David Relman, Stanford University “Early days: assembly of the human gut microbiome during childhood” 
  • Bruce German, UC Davis

The only major issue for me is I am losing my voice.  So we will see how this goes.  Though I note I have gotten some very sage advice on how to treat my voice problem via the magic of twitter.  If I do not collapse I will also be tweeting/posting about the other talks during the day. 


Anyway – here is the storification:

http://storify.com/phylogenomics/climb-symposium-at-uc-davis.js<a href=”http://storify.com/phylogenomics/climb-symposium-at-uc-davis” target=”_blank”>View “CLIMB Symposium at UC Davis” on Storify</a>

Coming Monday at #UCDavis "The Infant Gut Microbiome: Prebiotics, Probiotics, & Establishment"

Just a little announcement here.  There is a symposium tomorrow at UC Davis organized by a undergraduates in the CLIMB program.  CLIMB stands for “Collaborative Learning at the Interface of Mathematics and Biology (CLIMB)” and is a program that emphasizes hands-on training using mathematics and computation to answer state-of-the-art questions in biology.  A select group of undergraduates participate in the program and this summer the students had to do some sort of modelling project.  Somehow I managed to convince them to do work on human gut microbes.  And they have done a remarkable job.

As part of their summer work, they organized a symposium on the topic and their symposium takes place tomorrow.  Details are below.

The Infant Gut Microbiome: Prebiotics, Probiotics, & Establishment

Monday, 12 September 2011, 9am-4pm

Life Sciences 1022

UC Davis

9:00-9:10 Introduction

9:10-9:40 Jonathan Eisen, UC Davis

“DNA and the hidden world of microbes”

9:40-10:40 Mark Underwood, UC Davis

“Dysbiosis and necrotizing enterocolitis”

10:40-10:50 break

10:50-11:50 Ruth Ley, Cornell University

“Host-microbial interactions and metabolic syndrome”

11:50-12:00 general discussion

12:00-1:00 lunch

1:00-2:00 CLIMB 2010 cohort

“Breast milk metabolism and bacterial coexistence in the infant microbiome”

2:00-2:10 break

2:10-3:10 David Relman, Stanford University

“Early days: assembly of the human gut microbiome during childhood”

3:10-3:40 Bruce German, UC Davis

3:40-4:00 next steps

The only major issue for me is I am losing my voice.  So we will see how this goes.  Though I note I have gotten some very sage advice on how to treat my voice problem via the magic of twitter.  If I do not collapse I will also be tweeting/posting about the other talks during the day.



Biomed Central web sites do such weird things when viewed in Safari …

Bmc

A Forest (Rohwer that is) on Black Reefs, Shipwrecks and Coral Reef Conservation

Well Forest Rohwer is at it again.  He just is always doing something I find worth paying attention to.  
First, he does fascinating and pioneering science on viruses in the environment.  For example, consider that he was one of if not the first to do random shotgun metagenomics from environmental samples.  See his lab’s 2001 and 2002 papers on the topic (Production of shotgun libraries using random amplification and Genomic analysis of uncultured marine viral communities) which I note came out before the Sargasso and Acid Mine Drainage papers which most cite as the first environmental shotgun sequencing pubs.  
In fact, you could say in many ways we do very similar work, except he focuses on viruses.  Not that we always agree mind you. I once gave a talk after him at a meeting and I changed my title to “Seeing the Forest and Missing the Trees” in a little dig at his not using phylogenetic methods and in his approach to metagenomic analysis.  But I digress. 
What I want to write about today is a new paper from his lab: Black reefs: iron-induced phase shifts on coral reefs.


Alas, it is not freely available as it is in ISME but is not published under their “open” option.  Am working on getting a link to an available PDF … will let everyone know.

Here is the abstract:

The Line Islands are calcium carbonate coral reef platforms located in iron-poor regions of the central Pacific. Natural terrestrial run-off of iron is non-existent and aerial deposition is extremely low. However, a number of ship groundings have occurred on these atolls. The reefs surrounding the shipwreck debris are characterized by high benthic cover of turf algae, macroalgae, cyanobacterial mats and corallimorphs, as well as particulate-laden, cloudy water. These sites also have very low coral and crustose coralline algal cover and are call black reefs because of the dark-colored benthic community and reduced clarity of the overlying water column. Here we use a combination of benthic surveys, chemistry, metagenomics and microcosms to investigate if and how shipwrecks initiate and maintain black reefs. Comparative surveys show that the live coral cover was reduced from 40 to 60% to 0.75 km2). The phase shift occurs rapidly; the Kingman black reef formed within 3 years of the ship grounding. Iron concentrations in algae tissue from the Millennium black reef site were six times higher than in algae collected from reference sites. Metagenomic sequencing of the Millennium Atoll black reef-associated microbial community was enriched in iron-associated virulence genes and known pathogens. Microcosm experiments showed that corals were killed by black reef rubble through microbial activity. Together these results demonstrate that shipwrecks and their associated iron pose significant threats to coral reefs in iron-limited regions.

Forest and others have recently been studying the Line Islands because they are relatively undisturbed reefs. Here are a short video about the work there (the work in general, not this specific study per se): http://oceantoday.noaa.gov/swf/flowplayer-latest.swf

Anyway, the new paper does something very different.  It focuses on shipwrecks and the impact of these wrecks on reefs.  This is of particular interest because as indicated in the abstract, the reefs are very low in iron.  And many shipwrecks introduce massive amounts of iron.  What they conclude in this new paper is that the iron from the shipwrecks leads to algal blooms, and lead to rapid killing of / damage to the pristine reefs.

For more on the paper there is an article in National Geographic Newswatch by Enric Sala worth checking out.

Forest also wrote me some information by email.  He states:

Black reefs are associated with shipwrecks or other debris in this region of the world. These sites are interesting both from a conservation and scientific point of view. As a conservation issue, they are amazingly destructive. Kingman, one of the jewels of the USA coral reefs, has lost >1 km of the lagoon in less than 3 years. An old wreck on Fanning atoll has killed about 10% of their reef.

Visually, the black reefs are some of the eeriest places I’ve ever seen. The bottom is completely covered in different algae (including cyanobacterial mats), the water is filled with marine snow, and dark precipitate on the benthos (probably sulfur). We just published a paper in ISME where we have recreate the precipitate, cloudiness, and
coral death in microcosms by combining rubble from the black reefs, with corals and an iron addition. Addition of antibiotics blocks the coral death, precipitate, and marine snow, suggesting a microbial role.

The black reefs are probably caused by iron-enrichment from the wrecks and debris. We think black reefs are specific to non-emergent coral reefs, where iron is a limiting nutrient. Our current model is that iron stimulation of algae leads to increased microbial activity and coral death. In support of this, metagenomic analysis of the microbial community showed an enrichment of iron-related pathogenicity factors.

Forest also adds a plea to help in conservation of these reefs.

If you are interested in conservation, then please help us petition Congress to support removal of the wrecks and debris. Please contact Emily Douce at the Marine Conservation Biology Institute.

I encourage people to contact her.

Wondering when doctors offices are going to have charts on the microbiome

P198

What is a nice chloroplast like you doing in a parasite like that?

Cool new paper from Joe Derisi’s lab: PLoS Biology: Chemical Rescue of Malaria Parasites Lacking an Apicoplast Defines Organelle Function in Blood-Stage Plasmodium falciparum. by Ellen Yeh and Joseph L. DeRisi. doi: 10.1371/journal.pbio.1001138

In it they use some experimental techniques to try and track down the elusive function of the apicoplast in Plasmodium falciparum, the causative agent of malaria.  The apicoplast is an organelle that is evolutionarily derived from chloroplasts (and thus derived originally from cyanobacteria).  Due to it’s cyanobacterial origins many have thought that it might serve as a good target for drugs to try and kill Plasmodium species because in theory such drugs if specific should not have significant detrimental effects on hosts like humans due to our lack of known important cyanobacterial associates.

Here is their abstract:

Plasmodium spp parasites harbor an unusual plastid organelle called the apicoplast. Due to its prokaryotic origin and essential function, the apicoplast is a key target for development of new anti-malarials. Over 500 proteins are predicted to localize to this organelle and several prokaryotic biochemical pathways have been annotated, yet the essential role of the apicoplast during human infection remains a mystery. Previous work showed that treatment with fosmidomycin, an inhibitor of non-mevalonate isoprenoid precursor biosynthesis in the apicoplast, inhibits the growth of blood-stage P. falciparum. Herein, we demonstrate that fosmidomycin inhibition can be chemically rescued by supplementation with isopentenyl pyrophosphate (IPP), the pathway product. Surprisingly, IPP supplementation also completely reverses death following treatment with antibiotics that cause loss of the apicoplast. We show that antibiotic-treated parasites rescued with IPP over multiple cycles specifically lose their apicoplast genome and fail to process or localize organelle proteins, rendering them functionally apicoplast-minus. Despite the loss of this essential organelle, these apicoplast-minus auxotrophs can be grown indefinitely in asexual blood stage culture but are entirely dependent on exogenous IPP for survival. These findings indicate that isoprenoid precursor biosynthesis is the only essential function of the apicoplast during blood-stage growth. Moreover, apicoplast-minus P. falciparum strains will be a powerful tool for further investigation of apicoplast biology as well as drug and vaccine development.


The author summary is a bit nicer in my opinion:

Malaria caused by Plasmodium spp parasites is a profound human health problem that has shaped our evolutionary past and continues to influence modern day with a disease burden that disproportionately affects the world’s poorest and youngest. New anti-malarials are desperately needed in the face of existing or emerging drug resistance to available therapies, while an effective vaccine remains elusive. A plastid organelle, the apicoplast, has been hailed as Plasmodium’s “Achilles’ heel” because it contains bacteria-derived pathways that have no counterpart in the human host and therefore may be ideal drug targets. However, more than a decade after its discovery, the essential functions of the apicoplast remain a mystery, and without a specific pathway or function to target, development of drugs against the apicoplast has been stymied. In this study, we use a simple chemical method to generate parasites that have lost their apicoplast, normally a deadly event, but which survive—“rescued” by the addition of an essential metabolite to the culture. This chemical rescue demonstrates that the apicoplast serves only a single essential function, namely isoprenoid precursor biosynthesis during blood-stage growth, validating this metabolic function as a viable drug target. Moreover, the apicoplast-minus Plasmodium strains generated in this study will be a powerful tool for identifying apicoplast-targeted drugs and as a potential vaccine strain with significant advantages over current vaccine technologies.

Also see their press release here.

Basically they are trying to use various experimental tricks to figure out which functions of the apicoplast are essential.  Many theories have been proposed over the years as to what the apicoplast is doing.  But few have gained significant evidence.  This paper is an important contribution because it suggests that one pathway in particular is most functionally important: the isopentenyl pyrophosphate (IPP) synthesis pathway.  See their model below:

Figure 5. Model of apicoplast function.
(Top) The essential function of the apicoplast is the production of isoprenoid precursors, IPP and DMAPP, which are exported into the cytoplasm and used to synthesize small molecule isoprenoids and prenylated proteins. Parasites that are unable to synthesize isoprenoid precursors either due to inhibition of the biosynthetic pathway by fosmidomycin or loss of the apicoplast following doxycycline inhibition can be chemically rescued by addition of exogenous IPP (red). The exogenous IPP enters the host cell through unknown membrane transporters and fulfills the missing biosynthetic function. (Bottom) Reaction scheme for MEP pathway biosynthesis of IPP and DMAPP with the enzymatic step inhibited by fosmidomycin indicated.

Anyway – I have always been fascinated by apicoplasts because they are so weird.  They reflect a strange evolutionary history of Apicomplexans in that this is a eukaryotic lineage that at some point brought into itself an entire photosynthetic algal cell as a symbiont.  And for reasons still unknown (if there are reasons …) the chloroplast of the algal symbiont was retained while most of the rest of the symbiont was ditched.  So that the resulting cells looked something like this:

From http://wiki.ericmajinglong.com/index.php?title=A_special_case:_The_apicomplexan_plastid

Evolution is indeed very weird.  And once it was discovered that the apicoplast was in fact derived from chloroplasts (this was discovered using molecular phylogenetics) (e.g., see http://www.sciencedirect.com/science/article/pii/016668519490149X) people have been wondering if it might make a good drug target.  But people have also been wondering – what do Apicomplexans do with a chloroplast like organelle when they do not photosynthesize.  So the Derisi paper is interesting both from a drug treatment point of view but also from an evolution point of view.

Anyway – here are some other links worth looking at:

My science communication hero/heroine of the month – Dr. Kiki @drkiki

Been working on revising my lab’s web site and was looking for some videos of talks I have given online to post there.  And I discovered/rediscovered this video of an interview I did for Dr. Kiki’s Science Hour.  Here it is:

NOTE – AT LEAST TEMPORARILY REMOVING THE VIDEO DUE TO MALWARE INFECTION OF TWIT.TV SITE

Now I know – this is over a year old. But I just watched the full video. Not so bad I think.

As many of you know, I like to talk.  And talk.  And talk.  But I would like to say that as an interviewer, Dr. Kiki is pretty frigging awesome.  Don’t know how she does it.  But I am going to post this video on the new lab page and point people to it if they want to know what my lab does and what I am interested in.

But enough about me.  I want to thank Dr. Kiki for this great interview by saying a little bit about her.  Or, well, her work in science communication.

As some of you may know, I listen to podcasts of TWIS – This Week in Science frequently on my bike rides to work.  And I really recommend anyone/everyone out there give it a whirl.  It is sort of like Science Friday but it is a bit edgier, a bit funnier, a bit goofier, and a bit sciencier (is that a word?)  Dr. Kiki and Justin on it are great and it is so good that I frequently sit outside my building listening to the end of a show if I take the short ride to work which is less than an hour.  So if you like Science – you really should check out the TWIS web site and find some way to listen such as what I do by subscribing to their podcasts at iTunes.

And I guess now I will be checking out “Dr. Kiki’s Science Hour” more after rewatching this video.  There are many many more shows at twit.tv/kiki.  I have not checked out as many as TWIS shows but the ones I have watched are great.

And if you want to follow her more directly check out her Blog: The Bird’s Brain, or her twitter feed  (@drkiki)  or her  Google+ feed.

Very proud that she is a UC Davis alum … and just want to say thanks to her for giving me a video I can share with others that says more about me and my lab than almost anything I have written.

Fun with Pubmed central – first paper describing #HeLa cells – Note @rebeccaskloot

Last year, Rebecca Skloot came to Davis to talk about her book “The Immortal Life of Henrietta Lacks“.  Note – if you have not read the book – what f*$ing rock have you been hiding under?  It is in my opinion the best non fiction book I have ever read.  Seriously.  Not the best science book. The best non fiction book of any kind.  And I am not alone in this feeling as it has won a bazillion positive reviews and awards.   In summary – it tells three stories – the story of the isolation of HeLa cells, the story of the woman from whom those cells came, and the story of Skloot learning the other two stories.

Anyway – I somehow managed to get her to come to UC Davis to give a talk last year just as the book was going viral.  In preparation for Skloot’s visit I decided to do some sort of “open access” schtick and looked into how many papers about HeLa cells were in Pubmed Central.  Pubmed Central is a database of papers for which the full text is available at no charge.  After this mini-research and after interacting with Rebecca over the last year and seeing the well deserved recognition of her book, I have been a bit fascinated about how much of the literature surrounding studies of HeLa cells is openly and/or freely available.

So today I decided to see what was the earliest HeLa paper that was freely available on PubMed Central.  And I managed to find a good one: Studies on the propagation in vitro of poliomyelitis viruses. IV. viral multiplication in a stable strain of human malignant epithelial cells (strain hela) derived from an epidermoid carcinoma of the cervix. William F. Scherer, Jerome T. Syverton, and George O. Gey.  J Exp Med. 1953 May 1; 97(5): 695–710.

I believe this was the first full paper published discussing HeLa cells.  Nice short title by the way. Anyway – good to see it in Pubmed Central.

I note  – I could not find in Pubmed Central the meeting abstract about HeLa cells which was published in 1952 but I did find it from Cancer Research’s online archive here.  I have copied the abstract below, for you HeLa history buffs out there:

TISSUE CULTURE STUDIES OF THE PRO-
LIFERATIVE CAPACITY OF CERVICAL
CARCINOMA AND NORMAL EPITHE-
LIUM. George O. Gey,Ward D. Coffman*
and Mary T. Kubicek *(Departments of
Surgery and Gynecology, Johns Hopkins
Hospital and University, Baltimore 5, Md.

This is a report of an evaluation in vitro of the
growth potential of normal, early intra-epithelial,
and invasive carcinoma from a series of cases of
cervical carcinoma. Comparable cytological and
tissue culture studies were actually carried out on
selected biopsies of normal and neoplastic areas of
the same cervix. Thus far, only one strain of epi-
dermoid carcinoma has been established and
grown in continuous roller tube cultures for al-
most a year. It grows well in a composite medium
of chicken plasma, bovine embryo extract, and
human placenta! cord serum. The autologous nor-
mal prototype is most difficult to maintain under
comparable cultural conditions. Most of the tissue
from other cases showed rapid keratinization of
the cells grown in cultures whether from normal
or neoplastic areas. Some of the hormonal aspects
of the problem will be discussed.

Anyway – if you are interested in HeLa and/or The Immortal Life of Henrietta Lacks, you may find it interesting to check out these early papers on the topic such as the ones described above.  Here are a few more that are from the early era and are freely available:

Pubmed Central is a rich resource not just for accessing scientific papers but for learning about science history too.  It is a good thing that articles in Pubmed Central are available at no charge and here’s hoping that sometime soon that past and present and future science papers will be more readily available to all.

UPDATE ———-
John Hogenesch from U. Penn made a nice figure of relevance and agreed to let me post it:

And he comments “You can see several things, Nixon’s “war on cancer” in the early 70s and the dawn of the cell/molecular biology age in the 80s and expansion in the 90s.”

Thanks John …

More on ‘phylogenomics’ – as in functional prediction w/ phylogeny

There is a new paper out: Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium in Briefings in Bioinformatics.

The paper is interesting and presents a new general approach to using phylogeny for functional prediction of uncharacterized genes. I am interested in this for many reasons including that I was one of, if not the first to lay this out as a concept.  In a series of papers from 1995-1998 I outlined how phylogenetic analysis could be used to aid in functional prediction for all the genes that were starting to be sequenced in genome projects without any associated functional studies (at the time, I referred to all these ESTs and other sequences as an “onslaught” – little did I know what was to come).

My first paper on this topic was in 1995: Evolution of the SNF2 family of proteins: subfamilies with distinct sequences and functions.  The abstract is below:

The SNF2 family of proteins includes representatives from a variety of species with roles in cellular processes such as transcriptional regulation (e.g. MOT1, SNF2 and BRM), maintenance of chromosome stability during mitosis (e.g. lodestar) and various aspects of processing of DNA damage, including nucleotide excision repair (e.g. RAD16 and ERCC6), recombinational pathways (e.g. RAD54) and post-replication daughter strand gap repair (e.g. RAD5). This family also includes many proteins with no known function. To better characterize this family of proteins we have used molecular phylogenetic techniques to infer evolutionary relationships among the family members. We have divided the SNF2 family into multiple subfamilies, each of which represents what we propose to be a functionally and evolutionarily distinct group. We have then used the subfamily structure to predict the functions of some of the uncharacterized proteins in the SNF2 family. We discuss possible implications of this evolutionary analysis on the general properties and evolution of the SNF2 family.



I note – I am annoyed that when I went to the Nucleic Acids Research site for my paper I discovered for some bizarre reason they are now trying to charge for access to it even though it is in Pubmed Central and used to be freely available on the NAR site.  WTF?  Is this just an IT issue like the #OpenGate complaints I made for a while about Nature Genome papers.

Anyway – in that paper in 1995 I basically showed that at least for this family, phylogenetic analysis could be used as a tool in making functional predictions by allowing one to better identify orthology relationships and subfamilies within the SNF2 superfamily.  This was novel I think maybe a little bit but others at the time were also looking into using various analyses to identify orthology relationships across genomes.

Shortly thereafter I started working on the concept that one could used the phylogenetic tree more explicitly in making functional predictions and eventually I laid out the concept of treating function as a character states and doing character state reconstruction using a gene tree to then infer functions for uncharacterized genes.  I called this approach “phylogenomics” in a paper in 1997 in Nature Medicine (the editor asked us to give it a name … and thus my own contribution to the omics word game began).  Alas somehow the title of our paper became “Gatrogenomic delights” a movable feast” since we were writing about the E. coli and H. pylori genomes, so I added yet another omics term at the same time.  In the paper I showed how phylogenetic analysis of the MutS family of proteins could help in interpreting one of the findings in the H. pylori genome paper:

In this paper we showed why blast searches were not ideal for inferring relationships among sequences (because blast measures similarity NOT evolutionary history per se).  A bit annoyed still that other papers then sort of claimed they were the first to show blast was not ideal for inferring evolutionary relatedness, but whatever. This still did not fully describe the phylogeny driven approach that I was working on so I then wrote up an outline of this approach for a paper in Genome Research: Phylogenomics: Improving Functional Prediction for Uncharacterized Genes by Evolutionary Analysis.  This paper really laid out the idea in more detail:

It also gave detailed examples of how similarity searches could be misleading and how phylogenetic analysis should in principle be better.

I note – I am very very proud of this paper.  But it did not do a lot of things.  Really it was about laying out a concept of using tools from phylogenetics in functional prediction.  But it did not provide software for example.  I later developed some of my own scripts for doing this when I was at TIGR but really the software for phylogeny driven functional predictions would come later from others like Kimmen Sjolander, Sean Eddy, and Steven Brenner.  Each method laid out in these tools and in other papers had its own flavors and I continued to explore various approaches and applications to phylogeny driven functional prediction.  Examples of my subsequent work are listed below (with links to the Mendeley pages for these papers):

Plus we (at TIGR) used phylogenetic analysis as a tool in annotation of many many genomes as well as metagenomes.

Anyway, enough of history for a bit.  What is interesting about this new paper is that they take a slightly different approach to phylogeny driven functional prediction in that they make use of Gene Ontology functional annotations as their key parameter to trace on evolutionary trees.  They lay out the differences in their method quite well in the introduction:

Our general approach is similar to the ‘phylogenomic’ method proposed by Eisen [6] and further developed into a probabilistic form by Engelhardt et al. [7], but with important differences. Eisen proposed a conceptual approach for predicting protein function using a phylogenetic tree together with available experimental knowledge of proteins. The original approach relied on manual curation to identify gene duplication events and to find and assimilate the literature for characterized members of the family. Engelhardt et al. used automated reconciliation with the species tree [8] to identify gene duplication events, and experimental GO terms (MF only) to capture the experimental literature. Using this information, they defined a probabilistic model of evolution of MF involving transitions between different molecular functions.

From these previous studies, we adopt the basic approach of function evolution through a phylogenetic tree and the use of GO annotations to represent function. However, unlike these other phylogenomic methods, we represent the evolution in terms of discrete gain and loss events. In Eisen’s original model, an annotation does not necessarily represent a gain of function (it could have been inherited from an earlier ancestor), and losses are not explicitly annotated. The transition-based model of Engelhardt et al. assumes replacement of one function by another (gain of one function coupled to the loss of another), and does not capture uncoupled events, which is particularly important for BP annotations and cases where a protein has multiple molecular functions (see examples below). In addition, we make no a priori assumptions about conservation of function within versus between orthologous groups, or about the relationship between evolutionary distance and functional conservation (as the distance may not necessarily reflect every given function). While, as described below, gene duplication events and relatively long tree branches are important clues for curators to locate functional divergence (gain and/or loss), in our paradigm an ancestral function can be inherited by both descendants following a duplication (resulting in paralogs with the same function) or gained/lost by one descendant following a speciation event (resulting in orthologs with different functions). Evolution of each function is evaluated on a case-by-case basis, using many different sources of information about a given protein family

I note – Paul Thomas, one of the authors here has also been developing phylogeny driven functional prediction methods for many years and has done some cool things previously.  This new approach seems novel and useful and their paper is worth looking at.  I like too that they focus on MutS homologs for some of their examples:

Anyway – their paper is worth a read and some of their software tools may be of use including PAINT: http://sourceforge.net/projects/pantherdb/ and http://pantree.org

Good to see continuous developments in phylogeny driven functional predictions.  If you want to learn more – check out the Mendeley Group I have created:

http://www.mendeley.com/groups/1190191/_/widget/29/5/

And please contribute to it. Below are some previous posts of mine of possible interest: