Blast from the past: video of a talk I gave in 2006 #metagenomics

Just re-found this video and posted it to youtube.  It is from a talk I gave in 2006 at the first “International Metagenomics Meeting” in 2006.

I think one may still be able to view videos from the CalIT2/UCSD page here. But I thought it might be better to have this talk on YouTube than at the CalIT site so I posted it … hope they don’t sue me.

Note – I wrote a blog post about the meeting here:
The Tree of Life: Metagenomics 2006

Special Guest Post & Discussion Invitation from Matthew Hahn on Ortholog Conjecture Paper


I am very excited about today’s post.  It is the first in what I hope will be many – posts from authors of interesting papers describing the “Story behind the paper“.  I write extensive detailed posts about my papers and also have tried to interview others about their papers if they are relevant to this blog.  But Matthew Hahn approached me recently about the possibility of him writing up some details on his recent paper on the functions of orthologs vs. paralogs.  So I said “sure” and set up a guest account for him to write up his comments and details of the paper.  


For those of you who do not know, Matt is on the faculty at U. Indiana.  He was a post doc at UC Davis so I have a particular bias in favor of him.  But his recent paper has generated some controversy (I posted some links about it here).  So it is great to get some more detail from him.  In addition, I note, I am also using this approach to try and teach people how easy it is to write a blog post by getting them guest accounts on Blogger and letting them write up something with links, pictures, etc.  So hopefully we can get more scientists blogging too.


Anyway – without any further ado – here is Matt’s post:

———————————————————————–
Following Jonathan’s excellent example of how explaining the history of a project helps to illuminate how the process of science actually happens, I thought I’d start by giving a bit of history behind our study, and the paper that we recently published in PLoS Computational Biology (http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002073). And then I’ll address the critics…
How this all got started
It all started a bit more than three years ago, in the summer of 2008. Pedja (as Predrag Radivojac is known to friends) was giving a talk to a group of us on protein function prediction that he also presented as a tutorial at the Automated Function Prediction SIG at ISMB 2008. Pedja and I had already collaborated on a small project involving the evolution of phosphoryation sites, but I really had no idea about his work on function prediction, and little idea in general about how function prediction was done. Reviewing different ways to accomplish transfer-by-similarity, he eventually got around to evolutionary (phylogenomic) approaches. Here is what I remember of this specific exchange during his talk:
Pedja: …and of course these methods only use orthologs for prediction, because orthologs have more similar functions than do paralogs.
me (from audience): Who says?
Pedja: Umm, you say. I mean, the evolutionary biologists say.
me: No, we don’t. I don’t know of any data that says any such thing.
Pedja: Whatever, Matt. We’ll talk about this later.
Well, we did talk about it later, and it turned out that although this claim is made in tons of papers, there is basically no data to support it. In the best cases a real example of one gene family will be cited, but there are very few of these. In the worst cases, the authors will just cite some random paper about gene duplicates (or Fitch’s original paper defining orthologs and paralogs). Of course I agree that patterns of sequence evolution might lead you to conclude this relationship was true, but there was no experimental data.


In fact, as we say in our paper, rarely did anyone recognize that this claim needed to be tested, or even that it was a claim that could be tested. At the time Eugene Koonin was the only person to say this: “The validity of the conjecture on functional equivalency of orthologs is crucial for reliable annotation of newly sequenced genomes and, more generally, for the progress of functional genomics. The huge majority of genes in the sequenced genomes will never be studied experimentally, so for most genomes transfer of functional information between orthologs is the only means of detailed functional characterization” (http://www.ncbi.nlm.nih.gov/pubmed/16285863). I really liked the way that Eugene had said this, and started to refer to the idea that orthologs were more functionally similar than paralogs as the “ortholog conjecture.” So to be clear: I completely made up this phrase, but used the most evocative word from the Koonin paper.
Luckily for Pedja and me we had just gotten a small internal grant to work on genome annotation and we had an incoming master’s student (Nathan Nehrt) who was willing to work on a project intending to test the ortholog conjecture.
Interlude: the crappy state of things in the study of the evolution of function
In order to test anything about how function evolves between orthologs and paralogs—or between any genes—one of course needs some kind of data on gene function in multiple species. And this turns out to be a big problem.
Because, as Koonin says in the earlier quote, the vast majority of experimental data comes from a very few species, and these species are not exactly closely related. Here is an approximate phylogeny of the major eukaryotic model organisms:
It’s obvious from this figure that if you need both 1) lots of functional data from two species, and 2) a pretty good idea of exactly what the homologous relationships are between the genes you’re studying, you’re going to have to study human and mouse.
This is actually a pretty bleak picture for people who study molecular evolution (as I do). While we have tons and tons of sequence data both within and between species, and a very good idea about how these sequences evolve, and fancy models with which to analyze these sequences…we know next to diddly-squat about general patterns relating these sequence differences to functional differences. There are lots of interesting things to be gleaned from studies of sequence evolution, but it really would be nice to know something about the relationship between sequence and function.
What we found
What exactly does the ortholog conjecture predict? In my mind, at least, it predicts something like this:
In this completely fictitious graph the relationship between protein function and sequence similarity is a declining one, only it declines faster for paralogs than it does for orthologs. Also, just possibly, gene duplicates start out with slightly diverged function the minute they appear. Anyway, those were our predictions.
But here is what we found (Figure 1 in Nehrt et al. 2011):

(Panel A uses the Biological Process ontology and panel B uses the Molecular Function ontology.)
There are really two different, equally surprising results here. First, there is no relationship between sequence divergence and functional divergence for orthologs (among 2,579 one-to-one orthologs between human and mouse). Absolutely none—it’s a straight horizontal line. Second, there is a relationship for paralogs (among 21,771 comparisons), exactly as we predicted there would be. So according to our results, paralogs actually have more conserved function than do orthologs. Our interpretation of the data was that the most important determinant of function was the organismal context in which a gene/protein found itself: given the same amount of sequence divergence, two proteins in the same organism would be more functionally similar. For orthologs, this means that the sequence divergence of our target gene was not the most important thing, but rather the sum total of divergence in all of the genes that contribute to its cellular context. Which is why all the orthologs have on average similar functional divergence—they are all exactly the same age and hence have approximately the same levels of divergence in these interactors (in this case sequence divergence for paralogs is a much better indicator of their splitting time).
Without going through every result in the paper and our interpretation of every result, suffice it to say that after about a year-and-a-half of working on this (around February 2010), we were satisfied that we had something we were willing to submit. I even seem to remember showing the above figure to Jonathan on a visit to UC-Davis! So we did submit the paper, first to PNAS and then, after rejection, to PLoS Computational Biology, where it was rejected again.
The content of the reviews was approximately the same at both journals. Basically, people were not convinced of our results mostly because the functional relationships were all based on data in the Gene Ontology database. To be specific, the functional data we used came from experiments conducted in 12,204 different papers. We didn’t use any predicted functions, only functions assigned using experimental data. And we did A LOT of work to try to eliminate problems that might have affected our results, including repeating the main analysis using only GO terms common to both the human and mouse datasets. But there can still be bias hidden within these functional assignments because someone always has to interpret the experiment—to say that a yeast two-hybrid experiment means that a gene has function X. And because of these biases, people weren’t buying it.
To get a measure of functional similarity that did not depend on the interpretation of any experiments, we decided to repeat the entire analysis using microarray data, using the correlation in expression levels across 25 tissues as the measure of functional similarity. By this time Nathan was graduating and moving on to Maricel Kann’s lab as a research programmer, so we recruited one of Pedja’s Ph.D. students, Wyatt Clark, to pick up where Nathan had left off. (Wyatt had actually been a student in my undergraduate Evolution course a few years earlier, so we figured he knew something…) After repeating all of the GO-based analyses himself—always better to double-check, right?—Wyatt got all of the microarray data in order and produced this figure (Figure 4 in Nehrt et al. 2011):
So a year after we first submitted a paper, we submitted a new version to PLoS CB including the array analysis, and this was enough to convince the reviewers that our results were not merely due to some strange bias in GO.
The fallout, and some responses
First, let me say that I had some idea that this would be a controversial-ish paper, and that we’d get at least some blowback. For about the first 20 versions of the manuscript (including some submitted versions) I put the words “ortholog conjecture” in quotes in the title, never an endearing move. (Pedja finally convinced me to take them out of the latest submissions.) But I also thought people would be happier that an untested assumption had finally been tested—and we have definitely gotten some positive feedback along these lines, including several groups that told us they have data that support our findings. By coincidence my lab had another paper come out the same week as this one (http://www.ncbi.nlm.nih.gov/pubmed/21636278), and I mistakenly thought it would generate much more attention. I still think the biological importance of the results in that one are much greater than the ortholog conjecture results, but either because we didn’t publish in an open-access journal (Jonathan is always right) or simply because the function-prediction community is more active on the interweb tubes, there have been a surprising number of critical responses (partially collected here: http://phylogenomics.blogspot.com/2011/09/some-links-on-ortholog-conjecture-paper.html). So here are some responses to general critiques.
The ortholog conjecture says only that orthologs are similar.
Okay, this one is a bit unfair, as only one person has said this. The real problem here is that Michael Galperin seems to have deeply misunderstood what we mean by the ortholog conjecture. According to him the ortholog conjecture is “the assumption that orthologs (genes with a common origin that were vertically inherited from the same gene in the last common ancestor of the host organisms) typically retain the same function or have closely related ones.” Umm, no. In fact, if you really think this is what the ortholog conjecture says, then our results support it—human and mouse orthologs do typically have closely related functions. But we are explicitly testing for a difference between orthologs and paralogs, not whether or not orthologs retain any functions. At no point did we say (or even hint) that orthologs should not be used for functional prediction. The whole point of our analysis and conclusions is that we should stop ignoring paralogs, which would give us a ton more data to use for the prediction of functions.
The assignments of orthology and paralogy are incorrect.
This is an easy one: we did in fact get the definitions of in- and out-paralogs correct (and laid them out in Figure S1). According to Sonnhammer and Koonin: “Our definition of ‘outparalogs’ is: paralogs in the given lineage that evolved by gene duplications that happened before the radiation (speciation) event” (http://www.ncbi.nlm.nih.gov/pubmed/12446146). For the purposes of our study, this means that outparalogs are defined as any paralogs that diverged before the speciation event between human and mouse and inparalogs diverged after this speciation event. Outparalogs do not indicate only paralogs in two different species, though by necessity in our dataset inparalogs are only found in the same species (all in human or all in mouse). Therefore, with respect to our conclusion that the most important determinant of function is which genome you are found in (i.e. context), it wouldn’t matter if we had incorrect gene trees: we would never confuse two genes in the same species (either inparalogs or some of the outparalogs) with two genes in different species (all orthologs and the remaining outparalogs).
You should have inferred functions yourselves
This is a fair suggestion, and not having enough time to annotate functions for 40,000 proteins would be a pretty weak excuse for doing good science. Instead…I’ll just say that it turns out professional curators are much better at assigning functions than even the original study authors (see http://www.ncbi.nlm.nih.gov/pubmed/20829821). Curators have a much broader view of the whole set of terms available in any ontology, and a much more consistent idea of how to apply these terms. My favorite line from the above cited article: “…because of the relatively low accuracy of the authors’ submissions, the use of authors’ annotations did not result in saving of curators’ time…”
GO is not appropriate for this analysis because it is biased.
This is the most frustrating criticism of our study, if only because it’s partly true: GO is biased. In our paper we actually detail several of these biases, including the observation that the same set of authors will give two proteins more-similar functions than will two different sets of authors. We tried very hard to attempt to control for these biases, though of course one cannot account for all of them. The most uncharitable part of this critique, however, has to be the fact that people conveniently forgot to say that our array analysis was completely distinct from the GO-based analysis (though it has its own issues), and that Burkhard Rost’s analysis of protein-protein interaction (http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020079) was also completely free of any bias in GO and was consistent with all of our results.
More annoying than this, you’d think from some of the critiques of GO that it was some sort of fly-by-night operation that no one should ever depend on. I mean, c’mon—there are human curators and human experimenters and of course they’re all biased so badly one could never compare functions between proteins much less between species. What were we thinking? (Only that the original GO paper has been cited >7000 times.) Funnily enough, at several points during the course of this work Pedja suggested—only half-jokingly—that we should just assume the ortholog conjecture was correct and write a paper about how GO must be wrong. Seriously, though: one would think from the excuses people came up with for the problems inherent in GO that we should simply stop using it to, you know, predict function in other species. And we were applying it to two relatively closely related mammals, one of which is explicitly a model for the other.
What next?
Our paper laid out several explicit hypotheses about the evolution of function that arose from our findings. Unfortunately, testing any of these hypotheses will require a ton more functional data, in more than one species. I know there are multiple groups working to collect these sorts of labor-intensive datasets, and Pedja and I are thinking about doing it ourselves (with collaborators, of course!). Massive datasets that reveal protein function will always be a lot harder to collect than sequence data, especially ones free from biases.
So let’s get to it…

—————————

Note – Toni Gabaldón was trying to post a detailed response but Blogger kept cutting him off with a character limit.  So I have posted his response below.

I appreciate the effort by Matthew Hahnn on explaining the story behind his paper on the so-called “Ortholog conjecture” and on facing some of the criticism. This paper attracted my interest as that of many others that work on or just use orthology. For instance it was chosen by one of my postdocs for our “Journal Club” meeting. And it was discussed during our last “Quest for Orthologs” meeting in Cambridge. I think is raising a necessary discussion and therefore I think is a good paper. This does not mean that I fully agree with the interpretation and conclusions ;-). I hope to modestly contribute to this debate with the following post.

I think one of the causes that this paper has caused so much debate is that the conclusions seem to challenge common practice (inferring function from orthologs), and could be interpreted as the need of changing the strategies of genome annotation. I think, however, that one should interpret carefully these results before start annotating based on paralogous proteins. As I will discuss below one of the problems is that we need to agree in what is the conjecture to then agree in how to test it. I see three main points that can be a source of confusion: i) the issue of what is actually stated by this conjecture, ii) the issue of annotation, and iii) the issue of time

1) What is the “ortholog conjecture”?
Or in other terms, when should we expect orthologs to be more likely to share function than paralogs?. Always? Of course not. All of us would agree that two recently duplicated paralogs are likely to be more similar in function than two distant orthologs, so it is obvious that the conjecture is not simply “orthologs are more similar in function than paralogs”. In reality the expectation that orthologs are more likely to be similar in function than paralogs, as least this is how I interpret it, is directly related to the effect that duplication have on functional divergence. If gene duplication has some effect on functional divergence (even in not 100% of the cases), then, given all other things equal (divergence time, story of speciation/duplication events – except fpr the duplication defining the orthologs) one would expect orthologs to be more likely to conserve function.

I think this complexity is not well considered (by many authors, in general). Hahn refeers to the famous review of orthology by Koonin (2005) as the source for the term “ortholog conjecture”. However, In that paper this conjecture is discussed always within the context of genes accross two particular species, whether in Hahn’s paper it is taken as well to other contexts. Thus, the proper context in which to test this conjecture is only between orthologs and between-species paralogs. As we can see,  Red and purple lines in Hahn paper in figure2 do not show any clear difference.

 Secondly, Koonin was very cautions in his paper, stating that he was referring to “equivalent functions” and not exactly the same “function”, correctly implying that the functional contexts would be different in the two different species. This brings me to the next point.

ii) annotation
If the expectation of functional conservation of orthologs refers to a given pair of species, then it makes no sense to test that expectation between paralogs within the same species and orthologs in different species. We were interested in this issue and it took us some effort to control for this “species” influence on the comparison, if you are interested you can read our paper on divergence of expression profiles between orthologs and paralogs (http://www.ncbi.nlm.nih.gov/pubmed/21515902)

As Hahn founds, and it was anticipated by Koonin in that review, there is a huge influence of the “species context”, a big constraint of what fraction of the function is shared. Indeed I think is the dominant signal in Hahn’s paper. Why is that? One possibility is that the functional context determines the function, I agree. However, we should not discard biases in how different communities working around a model species define processes and function, also the type of experiments that are usually done. For instance experimental inference from KO mutants might be common from mouse, but I guess is not the case in humans (!!). I think this may be having a big influence and might even be the dominant signal in Hahns paper.

Finally function has many levels and I expect subfunctionalization mostly affect lower levels (i.e. more specific). Biases may also
 exist in the level of annotation between species or between families of different size (contributing more or less to the orthologs/paralogs class).

Microarray data are less likely to be subject to biases (although some may exist), at least they should be expected to be free of “human interpretation biases” and so Hahn and colleaguies did well, in my opinion, of testing that dataset. It is important to note that for microarrays and for orthologs and between-species paralogs (which I think is the right frame for testing the conjecture) ortholgs are more likely to share an expression context. This is compatible to what we found in the paper mentioned above, and compatible with the orthology conjecture as stated by koonin (accross species)

iii) time
 Finally, one aspect which I think is fundamental is the notion of “divergence time”. Since paralogs can emerge at different time-scales they are composed by a heterogeneous set of protein pairs. Most of comparisons of orthologs and paralogs (Hahn’s as well) use sequence divergence as a proxy of time. However this is only a poor estimate, specially when duplications (as in here) are involved (we explored this issue in the past: http://www.ncbi.nlm.nih.gov/pubmed/21075746). This means that for a given divergence time paralogs may have larger sequence divergence than orthologs at the same divergence time, or otherwise (if gene conversion is playing a role). Is the conjecture based on sequence divergence or on divergence time?, I think the initial sense of using orthology to annotate accross species is based on the notion of comparing things at the same evolutionary distance. Thus basing our conclusions on divergence times might not be the proper way of doing it.

CONCLUSIONS AND PROPOSAL FOR RE-STATEMENT

To conclude, and with the intention of going beyond this particular paper,
I would finish by saying that the key to the problem lies on how we interpret the so-called “ortholog conjecture” or how are our expectations on how function evolves. What I get from re-reading Eugene Koonin’s paper and how I am using that “assumption” in my day-to-day work is the following:

“Orthologs in two given species are more likely to share equivalent functions than paralogs between these two species”

Therefore the notion of “accross the same pair of species” is important and thus only part of the comparisons made by Hahn and colleagues could directly test this. Looking at the microarray and between-species comparisons data, the conjecture may even hold true!!

I, however, do think that the conjecture as stated above is limited and does not capture the complexity of orthology relationships. Indeed us, and many other researchers, are tuning the confidence of the orthology-based annotation based on whether the orthologs are one-to-one, one-to-many or many-to-many, even when orthologs are “super-orthologs” (with no duplication event in the lineages separating the two orthologs).

Since, the underlying assumption of the ortholog conjecture is that duplication may (not necessarily always) promote functional shifts, then many-to-many orthology relationships will tend to include  orthologous pairs with different functions.

 Thus I would re-state the conjecture (or expectation) as follows:

 “In the absence of additional duplication events in the lineages separating them, two orthologous genes from two given species are more likely to share equivalent functions than two paralogs between these two species”

 This would be a more conservative expectation, which is closer to the current use of orthology-based annotation that tends to identify one-to-one orthologs, rather than any type.

 When duplications start appearing in subsequent lineages thus creating one- or many-to-many orthology relationships, the situation is less clear. Following the assumption that duplications may promote functional divergence. Then one could expand the conjecture by “the more duplications in the evolutionary history separating two genes, the lower the expectation that these two genes would share equivalent functions”.

 I wrote this contribution on the fly, and surely there are ways of expressing this in more appropriate terms. In any case I hope I made clear the idea that the conjecture emerges from the notion of duplications causing functional shifts and that our expectations will be clearer if expressed on those terms. This goes on the lines of what Jonathan Eisen mentioned on considering the whole phylogenetic story to annotate genes.

 Under this perspective, the real important hypothesis is that “duplications tend promote functional shifts”, I think this is based on solid grounds and has been tested intensively in the past.

 Cheers,

Toni Gabaldón

http://treevolution.blogspot.com

C-DEBI Research Support > Request for Research Proposals

Katrina Edwards on the Atlantis

I have always been fascinated by life in extreme places on the planet. And somehow I have managed to do projects on microbes from places like Antarctica, boiling hotsprings in Yellowstone and Kamchatka, acid pools, and more. The extremes are fascinating to me because they tell us a lot about the limits of life as well as indirectly about life in “normal” places.

And of course, I am not alone. Many many scientists are fascinated by life’s extremes. But not everyone ends up studying life in extreme environments of course. One reason for this is that many extreme environments that might be of interest are kind of hard to study. Consider the deep sea. Not so easy to do work there and just getting samples can be a massive undertaking.

Just imagine though. What if there were a way to “tag along” on an existing project studying life’s extremes at no cost to you or your grants? Even better what if there were a way to get extra funds to not just tag along on a project but to carry out detailed research at the same time?

Well, amazingly, there is such a chance right now. The C-DEBI “Center for Dark Energy Biosphere” project is calling for proposals. C-DEBI Research Support > Request for Research Proposals

They have money. They have drills. They have been and will continue to be collecting lots of samples from the bottom of the ocean and the crust below.  They are doing a bunch of microbiology (as well as other things). And they are calling for people out there to join them in various ways including;

And if you are interested they are heading out in a few days on a cruise to study the seafloor at “North Pond” a site in the bottom of the ocean on the Mid=Atlantic Ridge. For more information about this cruise see

I note – I was a visiting scientist for a few days at one of the C-DEBI meetings about evolution earlier this year. It was a great meeting – on Catalina Island – and I wrote a VERY long blog post about it: The Tree of Life: A “work” trip to Catalina Island: USC, Wrigley, C-DEBI, dark energy biosphere, Virgin Oceanic, Deep Five, & more. You can learn more about the C-DEBI project by reading that post.  And you can look at my pretty pictures below:

I note in addition, I am forever in debt to Katrina Edwards the PI of the C-DEBI project ever since she gave a frigging awesome tour to my kids of the Atlantis when it was docked in San Francisco

But regardless of the personal connections I have to C-DEBI, the project is very interesting and the fact that they are offering up funds to support “outsiders” who want to participate in the project in some way is great.

Great paper showing the potential power of comparative and evolutionary genomics in #PLoS Genetics

There is a wonderful paper that has just appeared in PLoS Genetics I want to call people’s attention to: PLoS Genetics: Emergence and Modular Evolution of a Novel Motility Machinery in Bacteria

In the paper, researchers from CNRS and Aix-Marseille in France used some nice comparative and evolutionary genomics analyses along with experimental work to characterize the function and evolution of gliding motility in bacteria.

Their summary of their work:

Motility over solid surfaces (gliding) is an important bacterial mechanism that allows complex social behaviours and pathogenesis. Conflicting models have been suggested to explain this locomotion in the deltaproteobacterium Myxococcus xanthus: propulsion by polymer secretion at the rear of the cells as opposed to energized nano-machines distributed along the cell body. However, in absence of characterized molecular machinery, the exact mechanism of gliding could not be resolved despite several decades of research. In this study, using a combination of experimental and computational approaches, we showed for the first time that the motility machinery is composed of large macromolecular assemblies periodically distributed along the cell envelope. Furthermore, the data suggest that the motility machinery derived from an ancient gene cluster also found in several non-gliding bacterial lineages. Intriguingly, we find that most of the components of the gliding machinery are closely related to a sporulation system, suggesting unsuspected links between these two apparently distinct biological processes. Our findings now pave the way for the first molecular studies of a long mysterious motility mechanism.

Basically, they started with some genetic and functional studies in Myxococcus xanthus.  They analyzed these in the context of the genome sequence (note – I was a co-author on the original genome paper).  And then they did some extensive comparative and evolutionary analysis of these genes, producing some wonderful figures along the way such as:

Figure 2. Taxonomic distribution of the closest homologues of the 14 genes composing the G1, G2, and M1 clusters, and genetic organization of the core complex. (A) For a given gene, the number of homologues in the corresponding genome is indicated by the numbers within arrows. The relationships between the species carrying the different homologues of the genes are indicated by the phylogeny on the left. Based on their taxonomic distribution, the 14 genes can be divided into Group A (grey background) and Group B (white background). (B) In all non Deltaproteobacteria and in Geobacter, the Group B genes clustered in a single genomic region.  doi:10.1371/journal.pgen.1002268.g002  


Based on their analysis they then came up with some hypotheses as to which genes were involved in key parts of gliding motility and what their biochemical functions were and they then went and confirmed this with experiments.  I am not going to go into detail on the functional work they did but you can read their paper for more details.

They wrapped up their paper by proposing an model for the evolutionary history of gliding motility.  I am not sure I buy all components of their model since our sampling of genomes right now is still very poor, but they have a pretty detailed theory captured in part in this figure:

Figure 8. Evolution and structure of the Myxococcus gliding motility machinery. A) Evolutionary scenario describing the emergence and evolution of the gliding motility machinery in M. xanthus. The relationships between organisms carrying close homologues of the 14 genes encoding putative components of the gliding machinery in M. xanthus are represented by the phylogeny. Green and red arrows respectively indicate gene acquisition and gene loss. The number of gene copies that were acquired or lost is indicated within arrows. The purple dotted arrows represent horizontal gene transfer events of one or several components. WGD marks the putative whole genome duplication event that occurred in the ancestor of Myxococcales. For each gene, locus_tag, former (agm/agl/agn) and new (glt and agl) names are provided. The number of complete genomes that contain homologues of glt and agl genes compared to the total number of complete genomes available at the beginning of this study are indicated in brackets. (B) The Myxococcus gliding machinery. The diagram compiles data from this work and published literature. Components were added based on bioinformatic predictions, mutagenesis, interaction and localization studies. Exhaustive information is not available for all proteins and thus the diagram largely is subject to modifications once more data will be available. Known interactions within the complex from experimental evidence are AglR-GltG, AglZ-MglA and interactions within the AglRQS molecular motor [13], [15]. For clarity, the proteins were colour-coded as in the rest of the manuscript 

Anyway – I don’t have much time right now to provide more detail on the paper.  But it is definitely worth checking out.

What is a nice chloroplast like you doing in a parasite like that?

Cool new paper from Joe Derisi’s lab: PLoS Biology: Chemical Rescue of Malaria Parasites Lacking an Apicoplast Defines Organelle Function in Blood-Stage Plasmodium falciparum. by Ellen Yeh and Joseph L. DeRisi. doi: 10.1371/journal.pbio.1001138

In it they use some experimental techniques to try and track down the elusive function of the apicoplast in Plasmodium falciparum, the causative agent of malaria.  The apicoplast is an organelle that is evolutionarily derived from chloroplasts (and thus derived originally from cyanobacteria).  Due to it’s cyanobacterial origins many have thought that it might serve as a good target for drugs to try and kill Plasmodium species because in theory such drugs if specific should not have significant detrimental effects on hosts like humans due to our lack of known important cyanobacterial associates.

Here is their abstract:

Plasmodium spp parasites harbor an unusual plastid organelle called the apicoplast. Due to its prokaryotic origin and essential function, the apicoplast is a key target for development of new anti-malarials. Over 500 proteins are predicted to localize to this organelle and several prokaryotic biochemical pathways have been annotated, yet the essential role of the apicoplast during human infection remains a mystery. Previous work showed that treatment with fosmidomycin, an inhibitor of non-mevalonate isoprenoid precursor biosynthesis in the apicoplast, inhibits the growth of blood-stage P. falciparum. Herein, we demonstrate that fosmidomycin inhibition can be chemically rescued by supplementation with isopentenyl pyrophosphate (IPP), the pathway product. Surprisingly, IPP supplementation also completely reverses death following treatment with antibiotics that cause loss of the apicoplast. We show that antibiotic-treated parasites rescued with IPP over multiple cycles specifically lose their apicoplast genome and fail to process or localize organelle proteins, rendering them functionally apicoplast-minus. Despite the loss of this essential organelle, these apicoplast-minus auxotrophs can be grown indefinitely in asexual blood stage culture but are entirely dependent on exogenous IPP for survival. These findings indicate that isoprenoid precursor biosynthesis is the only essential function of the apicoplast during blood-stage growth. Moreover, apicoplast-minus P. falciparum strains will be a powerful tool for further investigation of apicoplast biology as well as drug and vaccine development.


The author summary is a bit nicer in my opinion:

Malaria caused by Plasmodium spp parasites is a profound human health problem that has shaped our evolutionary past and continues to influence modern day with a disease burden that disproportionately affects the world’s poorest and youngest. New anti-malarials are desperately needed in the face of existing or emerging drug resistance to available therapies, while an effective vaccine remains elusive. A plastid organelle, the apicoplast, has been hailed as Plasmodium’s “Achilles’ heel” because it contains bacteria-derived pathways that have no counterpart in the human host and therefore may be ideal drug targets. However, more than a decade after its discovery, the essential functions of the apicoplast remain a mystery, and without a specific pathway or function to target, development of drugs against the apicoplast has been stymied. In this study, we use a simple chemical method to generate parasites that have lost their apicoplast, normally a deadly event, but which survive—“rescued” by the addition of an essential metabolite to the culture. This chemical rescue demonstrates that the apicoplast serves only a single essential function, namely isoprenoid precursor biosynthesis during blood-stage growth, validating this metabolic function as a viable drug target. Moreover, the apicoplast-minus Plasmodium strains generated in this study will be a powerful tool for identifying apicoplast-targeted drugs and as a potential vaccine strain with significant advantages over current vaccine technologies.

Also see their press release here.

Basically they are trying to use various experimental tricks to figure out which functions of the apicoplast are essential.  Many theories have been proposed over the years as to what the apicoplast is doing.  But few have gained significant evidence.  This paper is an important contribution because it suggests that one pathway in particular is most functionally important: the isopentenyl pyrophosphate (IPP) synthesis pathway.  See their model below:

Figure 5. Model of apicoplast function.
(Top) The essential function of the apicoplast is the production of isoprenoid precursors, IPP and DMAPP, which are exported into the cytoplasm and used to synthesize small molecule isoprenoids and prenylated proteins. Parasites that are unable to synthesize isoprenoid precursors either due to inhibition of the biosynthetic pathway by fosmidomycin or loss of the apicoplast following doxycycline inhibition can be chemically rescued by addition of exogenous IPP (red). The exogenous IPP enters the host cell through unknown membrane transporters and fulfills the missing biosynthetic function. (Bottom) Reaction scheme for MEP pathway biosynthesis of IPP and DMAPP with the enzymatic step inhibited by fosmidomycin indicated.

Anyway – I have always been fascinated by apicoplasts because they are so weird.  They reflect a strange evolutionary history of Apicomplexans in that this is a eukaryotic lineage that at some point brought into itself an entire photosynthetic algal cell as a symbiont.  And for reasons still unknown (if there are reasons …) the chloroplast of the algal symbiont was retained while most of the rest of the symbiont was ditched.  So that the resulting cells looked something like this:

From http://wiki.ericmajinglong.com/index.php?title=A_special_case:_The_apicomplexan_plastid

Evolution is indeed very weird.  And once it was discovered that the apicoplast was in fact derived from chloroplasts (this was discovered using molecular phylogenetics) (e.g., see http://www.sciencedirect.com/science/article/pii/016668519490149X) people have been wondering if it might make a good drug target.  But people have also been wondering – what do Apicomplexans do with a chloroplast like organelle when they do not photosynthesize.  So the Derisi paper is interesting both from a drug treatment point of view but also from an evolution point of view.

Anyway – here are some other links worth looking at:

My science communication hero/heroine of the month – Dr. Kiki @drkiki

Been working on revising my lab’s web site and was looking for some videos of talks I have given online to post there.  And I discovered/rediscovered this video of an interview I did for Dr. Kiki’s Science Hour.  Here it is:

NOTE – AT LEAST TEMPORARILY REMOVING THE VIDEO DUE TO MALWARE INFECTION OF TWIT.TV SITE

Now I know – this is over a year old. But I just watched the full video. Not so bad I think.

As many of you know, I like to talk.  And talk.  And talk.  But I would like to say that as an interviewer, Dr. Kiki is pretty frigging awesome.  Don’t know how she does it.  But I am going to post this video on the new lab page and point people to it if they want to know what my lab does and what I am interested in.

But enough about me.  I want to thank Dr. Kiki for this great interview by saying a little bit about her.  Or, well, her work in science communication.

As some of you may know, I listen to podcasts of TWIS – This Week in Science frequently on my bike rides to work.  And I really recommend anyone/everyone out there give it a whirl.  It is sort of like Science Friday but it is a bit edgier, a bit funnier, a bit goofier, and a bit sciencier (is that a word?)  Dr. Kiki and Justin on it are great and it is so good that I frequently sit outside my building listening to the end of a show if I take the short ride to work which is less than an hour.  So if you like Science – you really should check out the TWIS web site and find some way to listen such as what I do by subscribing to their podcasts at iTunes.

And I guess now I will be checking out “Dr. Kiki’s Science Hour” more after rewatching this video.  There are many many more shows at twit.tv/kiki.  I have not checked out as many as TWIS shows but the ones I have watched are great.

And if you want to follow her more directly check out her Blog: The Bird’s Brain, or her twitter feed  (@drkiki)  or her  Google+ feed.

Very proud that she is a UC Davis alum … and just want to say thanks to her for giving me a video I can share with others that says more about me and my lab than almost anything I have written.

Get to know Jack & the story behind the paper by @gilbertjacka "Defining seasonal marine microbial community dynamics"

ResearchBlogging.org A few days ago I became aware of the publication of a cool new paper: “Defining seasonal marine microbial community dynamics” by Jack A. Gilbert, Joshua A Steele, J Gregory Caporaso, Lars Steinbrück, Jens Reeder, Ben Temperton, Susan Huse, Alice C McHardy, Rob Knight, Ian Joint, Paul Somerfield, Jed A Fuhrman and Dawn Field.  The paper was published in the ISME Journal and is freely available using the ISME Open option. If you want to know more about Jack (in case you don’t know Jack, or don’t know jack about Jack) check out some of his rantings material on the web like his Google Scholar page, and his twitter feed, his LinkedIn page, his U. Chicago page. But rather than tell you about Jack or the paper, I thought I would send some questions to the first author, Jack Gilbert and see if I could get some of the “story behind the paper” out of him.  Since Jack likes to talk (and email and do things on the web), I figured it was highly likely I could get some good answers.  And indeed I was right. Here are his answers to my quickly written up questions (been out of the office due to family illness)


1. Can you provide some detail about the history of the project … How did it start ? What were the original plans ? (not this much sequencing I am sure)

The Western English Channel has been studied for over 100 years, and is in fact it is the longest studied marine site in the world. It is the home, essentially of the Marine Biological Association, and has a long history. The idea to start contextualizing the abundant metadata (www.westernchannelobservatory.org) was started in 2003 by Ian Joint, a senior researcher at Plymouth Marine Laboratory (www.pml.ac.uk), who saw the benefit of collecting microbial life on filters and storing these at -80C. It was his vision to create and maintain this collection that enabled us to go back through this frozen time series and explore microbial life. I started working for PML in 2005, and basically was charged with trying to identify a potential technique to characterize the microbial life in these samples. initially we got funding through the International Census of Marine Life to performed 16S rDNA V6 pyrosequencing on 12 samples. We chose 2007 as the first year, almost arbitrarily, and published that work in Environmental Microbiology in 2009 (http://onlinelibrary.wiley.com/doi/10.1111/j.1462-2920.2009.02017.x/abstract). However, we had already decided to go ahead, and with help from Dawn Field (Center for Ecology and Hydrology, UK) we were able to secure funding to pyrosequence 60 further amplicon samples, essentially we did 2003-2008. We deposited all these in the ICoMM dataset (link below) and it quickly became the largest study in the series. This was also a gold standard study for the Genomic Standards Consortium’s MIMARKS checklist (http://www.nature.com/nbt/journal/v29/n5/full/nbt.1823.html). We published the first analysis of these data in Nature Preceedings in 2010 (http://precedings.nature.com/documents/4406/version/1). We continued to characterize the microbial communities of the L4 sampling site in the Western English Channel by employing Metagenomic and Metatranscriptomic along side more 16S rRNA V6 pyroseqeuncing across diel and seasonal time scales throughout 2008 (the final year of the 6 year time series. This study was published in PLoS ONE also in 2010 (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0015545). This study also included our first analysis fo archaeal diversity in the English Channel, which was also funded through the ICoMM initiative. We owe a lot to Mitch Sogin’s group for the first attempts at data analysis for the 16S rDNA profiles. We had a lot of difficulty getting the message right for the 6-year paper that was recently published in ISME J. Basically it was an issue of sequencing data as Natural History, we were generating data catalogs, and not doing enough to characterize the ecology interactions that occurred there.  So we reached out to the community, and found research groups who could help us plug that gap. Those involved Rob Knight’s team, Alice McHardy’s team, and Jed Fuhrman’s team. We worked a lot of improving this paper, and had some valuable help from a wide selection of other researchers, including Steven Giovannoni, Doug Barlett, among many others.

The publication of this study however, is just the start. 

2. Who collected the samples? Any good field stories?

Samples were all collected by the fantastic boat staff at Plymouth Marine Laboratory, who routinely go out every Monday morning to collect water and specific samples for the whole laboratory. They were the life blood of that organization. One specific I always like to relate is that during the 2008 sampling season which generated samples for both the new ISME J paper (http://www.nature.com/ismej/journal/vaop/ncurrent/full/ismej2011107a.html) and the 2010 PLoS ONE paper (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0015545), we wanted to get diel sampling effort during the winter spring and summer. Unfortunately the only time I could convince my group to go out sampling for 24 hours was during the summer….some times science is limited by enthusiasm ;-). Also, the site is outside the Plymouth Sea Wall – which I think is still the largest concrete structure in the UK and was built in the 19th century, so taking people out to see the site (for what it was worth ;-)) meant taking them into usually very choppy water….which made people quite sick sometimes.In May 2009, J. Craig Venter and his crew came through to start the European leg of this Global Ocean Sampling expedition at L4, specificallly the Western English Channel. Together, our team at PML on our fishing boat, Plymouth Quest, and his team on-board the 100ft yacht, Sorcerer II sampled L4 and E1 (another monitoring site) in the Western English Channel. Excitingly these data form the first part of the attempt to start cataloguing the viral and Eukaryotic metagenomic and metatranscriptomic analysis of these communities. This analysis is being also further characterized using meta-metabolomics run by Carole Llewelyn at PML and Mark Viant at University of Birmingham. Increasing the multi’omic nature of these data.

3. Can you give some web links for data, people involved , etc?

  • People on the paper – not an exhaustive list of those involved….this is a huge community effort.

4. What else do you want people to know ?

We have recently started to model the English Channel from both a taxonomic and functional perspective. I have attached a presentation that has cool gifs that demonstrate this, people can email me and request the gifs if necessary. These are generated by Peter Larsen at Argonne National Laboratory.This modelling is being driven by two new tools:(1) Predicted Relative Metabolic Turnover, which uses fucntional annotations from metagenomes to create predicted metabolomes, which enable us to accurate predict the turnover (relative consumption or production) of more than 1000 metabolites in the English Channel (http://www.microbialinformaticsj.com/content/1/1/4).(2) Microbial Assemblage Prediction, which enables the prediction of the relative abundance of every bacterial taxon at any given location and time, the predictions are driven by in situ or remotely modeled environmental parameter data. We used satellite data to produce the figures above, truely BUGS FROM SPAAAAACCCCCEEEE…..This is the new paradigm – creating information and predictive models from data – no longer will metagenomics be descriptive Natural History – it is now becoming ECOLOGY. These tools will form the corner stone the Earth Microbiome Project’s (www.earthmicrobiome.org) data analytical initiative to create predictive models of microbial taxonomic community abundance structure and functional capability defined as the ability of a community to turnover metabolites.

Note – as a bit of a side story – I am disappointed in the ISME Journals “Open” option for publishing which, though it uses a creative commons license, it is a pretty narrow one that says, for example “You may not alter, transform, or build upon this work.” That is pretty limiting.  It means, for example, that the text cannot be reworded into a database of full text of papers where one uses intelligent language processing methods to play with the text.  It also means technically I probably cannot take the figures and modify them in any way to, for example, make an interesting movie using them.  Imagine if Genbank worked this way.  Imagine if you could only look at sequences but could not make alignments of them.  It is, well, not very open. So really this should be called the ISME “No charge” option or something like that since this is not “open access” to me – I think “open access” should really be reserved for material that is free of charge and free of most/all use restrictions (I prefer  the broader version of the “open access” definition described by Peter Suber.).  Sure – the fact that ISME makes some stuff available at no charge is nice.  And that they use CC licenses is good too since these are very straightforward to interpret compared to other licenses.  But their use of the no derivatives option seems silly. Anyway – nice paper.  And I hope some of the story behind the paper is useful to people.

Reference:

Gilbert JA, Steele JA, Caporaso JG, Steinbrück L, Reeder J, Temperton B, Huse S, McHardy AC, Knight R, Joint I, Somerfield P, Fuhrman JA, & Field D (2011). Defining seasonal marine microbial community dynamics. The ISME journal PMID: 21850055

What is in a name? A case study of genomic epidemiology w/ Bacillus cereus and Bacillus anthracis

ResearchBlogging.org There is a very interesting new paper that just came online in the Archives of Pathology: Rapidly Progressive, Fatal, Inhalation Anthrax-Like Infection in a Human: Case Report, Pathogen Genome Sequencing, Pathology, and Coordinated Response

I was alerted to the paper by Eileen Choffnes of the National Academy of Sciences Institute of Medicine Forum on Microbial Threats (which I am a member of).  In the paper, James Musser, Angela Wright and colleagues, the authors discuss the use of genome sequencing in the characterization of a fatal infection with a bacterium that appeared to be a species of Bacillus.  Their summary is below and pretty much sums it up:

Context.—Ten years ago a bioterrorism event involving Bacillus anthracis spores captured the nation’s interest, stimulated extensive new research on this pathogen, and heightened concern about illegitimate release of infectious agents. Sporadic reports have described rare, fulminant, and sometimes fatal cases of pneumonia in humans and nonhuman primates caused by strains of Bacillus cereus, a species closely related to Bacillus anthracis.

Objectives.—To describe and investigate a case of rapidly progressive, fatal, anthrax-like pneumonia and the overwhelming infection caused by a Bacillus species of uncertain provenance in a patient residing in rural Texas.

Design.—We characterized the genome of the causative strain within days of its recovery from antemortem cul- tures using next-generation sequencing and performed immunohistochemistry on tissues obtained at autopsy with antibodies directed against virulence proteins of B. anthracis and B. cereus.

Results.—We discovered that the infection was caused by a previously unknown strain of B. cereus that was closely related to, but genetically distinct from, B. anthracis. The strain contains a plasmid similar to pXO1, a genetic element encoding anthrax toxin and other known virulence factors. Immunohistochemistry demonstrated that several homologs of B. anthracis virulence proteins were made in infected tissues, likely contributing to the patient’s death.

Conclusions.—Rapid genome sequence analysis permit- ted us to genetically define this strain, rule out the like- lihood of bioterrorism, and contribute effectively to the institutional response to this event. Our experience strongly reinforced the critical value of deploying a well- integrated, anatomic, clinical, and genomic strategy to respond rapidly to a potential emerging, infectious threat to public health.

The part in which I am interested, not surprisingly, is the genomic-evolution part.  This is of interest since Bacillus anthracis is in a way a subspecies of a larger clade of bacterial types that includes Bacillus cereus and Bacillus thuringensis.  These two generally do not cause fatal disease in humans, though I believe there are prior cases (see for example, Anthrax, but not Bacillus anthracis?).  Also, I note, B. thuringensis is also known as Bt and is used extensively in agriculture to kill pests.  Anyway, though Bt abd Bc are known to occasionally cause humans trouble, without a doubt, if you say you have found a case of Bacillus anthracis in people, some serious freaking out will occur at some level.  People will want to know things – like is this a natural occurrence or a purposeful attack?  And in the end, there is a lot associated with the name since of course “anthrax” scares people.

So the authors here basically sequenced the genome of this isolate and then did some detailed phylogenomic analysis to place it in the Bacillus cereus/thuringensis/anthracis group.  See below:

And it turns out, this strain is not in the anthrax portion of the tree.  It is in the Bacillus cereus part.  The key thing here is that this strain is clearly, phylogenetically, in the Bacillus cereus/thuringensis part of this clade.  Not so sure if the genomics was necessary here.  I think some detailed MLST and/or phylogenetics of other variable markers might have done the same trick.  But still, the resolution one gets from the phylogenomics is pretty good.  This is of course nothing really new.  There have been some nice genomic epidemiological studies done in the last year or two such as in the German E. coli and in some other cases (see for example, Prospective Genomic Characterization of the German Enterohemorrhagic Escherichia coli O104:H4 Outbreak by Rapid Next Generation Sequencing Technology and Origins of the E. coli Strain Causing an Outbreak of Hemolytic-Uremic Syndrome in Germany and The 2011 Shiga toxin-producing Escherichia coli O104:H4 German outbreak: a lesson in genomic plasticity and Open-Source Genomic Analysis of Shiga-Toxin-Producing E. coli O104:H4 and Pathogens: Genes and Genomes).
Though this is not per se new, this paper focuses a bit more on the pathology part of the story and even ends by linking to a new paper from one of the authors on how this type of work changes how we should conceive of pathology training: 

One of us recently proposed the inception of a third, training track in pathology termed genomic pathology, designed to complement the traditional anatomic and clinical pathology tracks. As the introgression of genome- scale analyses proceeds rapidly and inexorably into con- temporary patient care and pathology practice, the career opportunities for this new type of trainee will increase considerably, and new patient-care niches will be created. We believe cases such as this highlight the need for, and potential utility of, a cadre of pathologists trained and facile in genomic pathology.

The paper on Genomic Pathology (thank goodness they did not invent a new omics word) can be found here.  There is no doubt we are in a new era.  Genomic sequencing is certainly going to be used in more and more cases like this and we definitely need to change the training paradigm if we want more people to use and understand it.
Some other reading worth checking out:
Some videos of interest

Reference:
Wright AM, Beres SB, Consamus EN, Long SW, Flores AR, Barrios R, Richter GS, Oh SY, Garufi G, Maier H, Drews AL, Stockbauer KE, Cernoch P, Schneewind O, Olsen RJ, & Musser JM (2011). Rapidly Progressive, Fatal, Inhalation Anthraxlike Infection in a Human: Case Report, Pathogen Genome Sequencing, Pathology, and Coordinated Response. Archives of pathology & laboratory medicine PMID: 21827220

Notes from a trip to Woods Hole, MA to teach #genomics at the MBL Microbial Diversity Course

Here are some notes from my recent trip to Woods Hole, MA where I went to give a talk for the Marine Biological Lab “Microbial Diversity Course”.

Day 1:  Thursday

My trip started quite poorly.  I wrote a whole post on the first day so if you want more detail go here: A squatter’s journey to the Marine Biological Lab (MBL).  I posted (of course) to twitter along the way.  Here are some of my posts:

  • Heading to Woods Hole/MBL-giving talk for symposium for the Microbial Diversity Class  
  • Anyone out there recommend best way to get from Logon to Woods Hole after 10:30 PM (no Peter Pan bus) w/o renting car? 
  • Thank you Delta for out early arrival in MSP- not so many thanks for sitting on runway for 20 minutes ad more waiting for gate
  • Yhgtbfkm – we finally got to a gate at MSP and the gate agents keep missing our door with jetway
  •  maybe I’ll see you as I head to my connection
    • Had a long twitter conversation with her about the fact that both of our flights were becoming disasters
  • Plane was very late bit now in a nice Prius from Green Shuttles on way to Woods Hole  
  • UGGGH – arrived Woods Hole/MBL; got dorm room key at 1am; woman in room not very happy; finally got other hot crummy dorm room; Ahh MBL

Day 2: Friday: Hanging out at MBL

Woke up at the Swope Dorms and, thanks to the lovely reception I got from the Housing Staff (see A squatter’s journey to the Marine Biological Lab (MBL) again for more detail) I was not very happy.  I went in to town to get a latte and something to eat and then made it over to the Microbial Diversity Course to hear a few talks and see some of the folks there.  Then I went back to my dorm room, packed up my stuff and abandoned Swope and went to the Sleepy Hollow Motor Inn just up the road, a bit out of town.  I had already called and they held a room for me (I tried the one place actually in town but they were full).  So I checked in, dumped my stuff and then walked back in to town.  I eventually ended up going to dinner with some of the course TAs and other personnel.

Here are some tweets from the day

Alas, was quite a bit tired from the horrible trip and bad housing experience so did not tweet much the whole day.  Here are some pics from the day:

View from my second room at Swope
View from my room of Eel Pond
View from my room – nice view – but room was unbearably
hot even on a cool day.
Microbial Diversity course lab

Microbial Diversity course lab
Microbial Diversity course lab

Microbial Diversity course lab

Microbial Diversity course lab
Eel Pond again
Eel Pond again
The Kidd
Art around MBL

Art around MBL
Art around MBL
Fun chairs in the Candle House
Fun chairs in the Candle House
Squid on a fence
Squid on a fence
More eel pond
Magical berries
Microbial mat

Microbial mat
Magical berries
Magical berries
Magical berries

Microbial mat 
Skate babies

Day 3: Symposium

Saturday was the day for the genomics symposium I had come for.  The symposium was hosted by the Microbial Diversity Course and was focused on microbial genomics.  There were four speakers – me, Howard Ochman, Nancy Moran and Eugene Koonin.  I thought the symposium went quite well — each speaker did a good job of not both complimenting and complementing the other speakers.   I hope the students liked it.

I spent many hours the night before and in the AM working on my talk, trying to fine tune it for the audience.  I grabbed a latte in the morning at a nice Woods Hole place, and eventually walked on over towards the lab.



I headed over to Swope and fortunately found a person from the course who told me where the talks were.  I gabbed some breakfast in the dining hall and then went to the room next door where the Symposium was going to be held.  I set up my laptop and alas noticed I had forgotten my Apple remote.  So I did a App store search to see if my iPhone could serve as a remote for Keynote and it can (for 99 cents).  So I downloaded the App and got it working and was ready to go.
I got a nice introduction from Dan Buckley, one of the Course organizers and then gave my talk.  I think I went a bit fast in parts but people seemed to like it.  I got some good questions and then it was time for a break.  Anyway – here are my slides, which I posted on Slideshare: Eisen Talk for MBL Microbial Diversity Course
View more presentations from Jonathan Eisen Then Howard Ochman gave a talk.  Here are some tweets from his talk:

  • Done with my talk at MBL for the Microbial Diversity course Symposium on Microbial Genomics – now listening to Howard Ochman
  • Howard Ochman discussing how genes in a bacterial genome w/ atypical composition are considered likely to have entered by lateral transfer
  • Ochman referencing classic paper by Sueoka “ON THE GENETIC BASIS OF VARIATION & HETEROGENEITY OF DNA BASE COMPOSITION” 
  • Ochman showing time course of the plot of genome size vs. # of genes for bacteria – all looked good 1kb=1 gene until M. leprae genome
  • Ochman quotes “Less than half of the genome contains functional genes but pseudogenes …. abound” 
  • Ochman: Why aren’t there lots of pseudogenes in most bacterial genomes? B/c there is a mutation bias towards deletions
  • Ochman referencing “Bacterial genome size reduction by experimental evolution”  re: deletion bias
  • Ochman making genetic drift personal: sometimes you pull out just the blue M&Ms, which of course you really don’t like 
  • Ochman referencing “The consequences of genetic drift for bacterial genome complexity” 
  • Ochman: an increase in genetic drift from reduced effective population size can lead to increase in Ka/Ks
  • Ochman discussing how effect of drift on bacterial genome size is opposite trend predicted in Lynch and Conery 2003

Then there was a little break for Lunch.  After lunch I had an entertaining conversation with Howard Ochman about various topics.  And then we were back to talks.

Nancy Moran.  Here are my tweets:

  • Listening to talk by Nancy Moran about tiny bacterial genomes – she is discussing her work w/ now retired  prof. Paul Baumann 
  • Moran – discussing work of Allison Hansen in her lab on bacterial gene expression in bacteria containing cells in aphid gut
  • Moran discussing incredible diversity of insect symbionts that help hosts obtain nutrients from nutrient poor diets 
  • Moran discussing the Tremblaya genome which has recently shown up in Genbank 
    • : @phylogenomics Tremblaya is awesome. John McCutchoen is the man – hope this is published soon.
    • : @phylogenomics 58% GC in an insect symbiont – simply weird. McCutcheon talked about this at SGM Insect Symbiosis in Harrogate, UK in April.
    •  yes, high GC but it is related to organisms with even higher GC

Then Eugene Koonin. Here are my tweets from his talk:

  • Now listening to the one and only Eugene Koonin discussing evolution of archaea/bacteria at MBL Microbial Diversity course 
  • I note my start in genome evolution really came from reading papers by Koonin on helicases
  • Koonin showing figures from one of my favorite papers of his: … the emerging dynamic view of the prokaryotic world 
  • Koonin: Archaeal genomes are even more gene dense than bacterial genomes
  • Koonin: the majority of genes in bacterial and Archaeal genomes are part of conserved families
  • Koonin: most gene families show patchy phyletic patterns across bacterial and Archaeal genomes
  • Note – Koonin has more than 500 papers listed in Pubmed
  • Koonin : most of the universal genes in bacteria and archaea are involved in translation
  • Koonin describes “bureaucratic ceiling” to genome size b/c of exponential incr. in regulators vs. genome size – can’t get too big
    •  @phylogenomics Limit on “genome size”. He means gene number (which does correlate in bact/arch but not euk)
    •  Sorry .. He is only discussing bacteria and archaea … So here it does correlated w/ genome size
    •  indeed .. He was using gene number as his key feature
  • Koonin describing 1998 Aravind et al paper on Aquifex which was 1st report of massive gene transfer between bacteria / archaea
  • Side story: when Thermotoga genome paper came out (I was buried as middle author) Koonin called me, POd that we had not refd Aquifex paper
    •  yes but this was a bit of a big deal … Press coverage … Nature paper, etc etc …
    •  The funny part was . He was POd at me even though I was buried in the middle b/c he said I should know better …
  • I must say Koonin is giving a damn excellent talk on bacteria and Archaeal evolution
  • Koonin discussions how there is a central tree-like structure in the “forest of life” of trees of conserved genes
  • Koonin discussions this: Comparison of phylogenetic trees and search for a … 
  • Koonin: there is a strong signal of vertical evolution even among much lateral gene transfer, b/c transfer is mostly random
  • ATGC: a DB of orthologous genes from closely related prokaryotic genomes & a research platform for microevolution
  • Koonin: “There is such a thing as a prokaryote” (gives many reasons)
  • Koonin discussing my favorite topic these days: CRISPR-CAS system
  • Koonin discussing his paper on early finding of crispr elements
  • Prediction: A Nobel in the near future will go for work on CRISPR/CAS system of adaptive immunity in bacteria / archaea
  • Koonin discussing the journal he helped start called Biology Direct which is both  and has open review
  • Koonin has a new Book: The Logic of Chance: The Nature and Origin of Biological Evolution: ProQuest Tech Books

After Koonin was done, everyone dispersed.  I wandered around and took some pics:

Magical mushrooms
Sloan Urinal (inside joke about http://microbe.net
????

I went back to my motel room for a little bit and then headed down to Eel Pond for a Course BBQ.

Deck for party
Deck for party

Photosynth stiched together pic
Eel pond

Party
Party
Party

I then headed in to town where my friend Nipam Patel was having a party for the Embryology Course he was teaching.  And I hung out as his house for a bit and then went back to my room.

Day 4: Home

Got up late.  Checked out.  Wandered into town with my suitcase.  Took some pics.

And after some internal debate, decided to switch my flight to return that day rather than go visit relatives in Boston (sorry Diana, Hal — just wanted to get home).  So I took the Bonanza Bus to Logan – discovered that Karl Stetter was also going on the bus to Logan.  I tried to watch the US-Brasil women’s soccer game on my iPad using the wireless they have on the bus but it was choppy.  So I just followed updates on the game – and even that was exciting.

Here are my tweets from the day:

Twisted tree of life award: @Discovermag for article on Lynne Margulis

Well, if you can, for a minute, ignore that fact that in Discover Interview: Lynn Margulis Says She’s Not Controversial, She’s Right | Evolution | DISCOVER Magazine Discover Magazine in essence is promoting some of the refuted ideas Lynne Margulis has about HIV. Sure they hint in part that they think she is over the top but they also give her a soapbox to spout some of her latest absurdities on HIV and such. I would suggest you don’t even read the main part of the Discover article. Just read Tara Smith’s discussion of it: Margulis does it again : Aetiology. Margulis should not be given such prominence in a magazine like Discover. But that is not what I am hear to write about. I am hear to point out that Discover also sets up a red herring for Margulis. In the beginning of the article, it is written:

“A conversation with Lynn Margulis is an effective way to change the way you think about life. Not just your life. All life. Scientists today recognize five groups of life: bacteria, protoctists (amoebas, seaweed), fungi (yeast, mold, mushrooms), plants, and animals. Margulis, a self-described “evolutionist,” makes a convincing case that there are really just two groups, bacteria and everything else.”

Seriously? Scientists today do not recognize five groups. Scientists today have moved past that to recognize and/or argue about bacteria, archaea and eukaryotes – the three domains of life. These three groups were first proposed in 1977 by Carl Woese and colleagues. Did Discover somehow miss the last 34 years of science? WTF? For setting up such an evolutionary red herring in this painful interview with Lynne Margulis, I am giving Discover Mag my coveted “Twisted tree of life award“. Past winners are: