September 2011 – Jonathan Eisen's Lab

Guest post from Antarctica: Joe Grzymski (@grzymski) on "The Story Behind Nitrogen Cost-Minimization"

Well, this is getting really fun. I have been doing “The Story Behind the Paper” posts for my own papers for a while and recently opened this up to guest posts. And the one today is coming to us from the true wilds – Antarctica. Joe Grzymski (aka @grzymski on Twitter) is out there doing field work (yes, microbiologists have the best field sites …). For more on the field project see the Desert Research Institute’s “Mission Antarctica” site. Joe responded to my request for more guest posts and wrote up a really nice discussion of a recent open access paper of his from the ISME Journal. If anyone else is interesting in writing a guest post on an open access paper or an issue in open access, let me know … without any further ado — below is Joe’s post

I thoroughly enjoy reading Jonathan’s posts detailing – far beyond what can possibly be included in published papers – the who, what, where, when, why and how of science. The story behind the potential fourth domain of life article in PLOS ONE provides great detail about how science is done. After reading Matthew Hahn’s insightful history and commentary on his ortholog conjecture paper I was happy to reply to the request for more “stories” and am chiming in from Antarctica (where I am currently doing field research) to discuss the story behind our recent paper in ISME J, “The significance of nitrogen cost minimization in the proteomes of marine microorganisms”. I hope it will provide another example of how a lot of science is lost in final, streamlined, published versions. Also, it is work that was largely done by an undergraduate and was vigorously and carefully reviewed – the improvements and expansion of ideas because of great reviewers highlights the best of the review process. What started out as a short two-page paper morphed into a larger piece of research – not things you can properly detail in a manuscript.

What was the origin of the idea?

The story behind this paper begins in 1997 when I was in graduate school at Rutgers University. Paul Falkowski joined the faculty right around the time when he published a seminal paper, “The evolution of the nitrogen cycle and its influence on the biological sequestration of CO₂ in the ocean.” Paul’s office was across from an office I shared with Jay Cullen (who will factor into the story later); Paul was on my committee and influential in how and what I studied in grad school and as a PostDoc. He constantly kept us on our toes (to say the least). Many of the implications of our recent paper were guided by his thoughts and original work on evolution of the nitrogen cycle and many papers on the functional and ecological factors that dictate the structure of phytoplankton communities. There are many papers here by Paul and the awesome Oscar Schofield- my primary dissertation adviser. Incidentally, I overlapped with Felisa Wolfe-Simon at Rutgers for a few years; she was in the science news recently [#arseniclife], and we had common advisers.

Paul’s paper was pre-genomics – but its scope and breadth are strengthened by recent work on isolates, environmental genomes and transcriptomes from the ocean. Simple mass balance says that the reason why we have oil buried deep in the earth and oxygen in the atmosphere is because photosynthesis (net carbon fixation and oxygenation of the atmosphere) exceeds respiration. During long periods of time, organisms draw down CO₂, and it gets sequestered from the atmosphere. In his paper, Paul details an inextricable link between the ratios of nitrogen fixation and denitrification (across geological periods) to the potential draw down of CO₂ by particulate organic carbon (namely, large sinking diatoms). That is, if nitrogen fixation is abundant and denitrification is zero, there is more available inorganic nitrogen (in the form of nitrate) in the surface ocean for phytoplankton to utilize and carbon sequestration increases. His paper further details why fixed nitrogen is limiting in the ocean surface across geological scales. It boils down to iron limitation, the specialization required to harness the beastly, triple-bond cracking but woefully inefficient nitrogenase enzyme (which has a high Fe requirement) and also the easier, multiple evolution of the process of denitrification. All of this is articulately summarized here.

How did this work advance?

Fast forward to 2001 and publication of the paper by Baudouin-Cornu et al. In this paper, links between environmental imprinting from fluctuating nutrient availability and atomic composition of assimilatory proteins are quantified. Using genome sequences from E. coli and S. cerevisiae, the authors show that carbon and sulfur assimilatory proteins have amino acid sequences that are depleted in carbon and sulfur side chains, respectively. This makes sense. Proteins high in carbon or nitrogen hardly would provide added fitness to an organism that often struggles to find enough of the nutrient to satisfy other fundamental cellular processes. Similar logic also explains why organisms tend to utilize smaller amino acids more frequently than larger ones: it takes more ATP to make a tyrosine than an alanine. Conversely, the pressure to “cost minimize” is less in organisms, like gut dwelling microbes, that have easy access to amino acids. It is not a perfect rule, but most of the time thermodynamic arguments explain a lot about why organisms do what they do. Fast forward again to Craig Venter’s genomic survey of select surface ocean sites (GOS). This (and now other) sequence data sets provided access to genomic information on organisms that inhabit various surface ocean biomes and, crucially, are largely difficult to isolate in pure culture.

What motivated the writing of the paper?

Last summer, I was sitting in my office writing a proposal. I can’t remember the specific topic, but I was thinking about cost-minimization mostly from the perspective of building proteins in cold environments and the challenges organisms face when it is cold: there is little access to organic carbon (food), and other environmental conditions hamper optimal living. I was re-reading Baudouin-Cornu, and there is a specific sentence in the paper in which the authors hypothesize that the phenomenon of cost-minimization might be a broader evolutionary strategy in resource-limited environments. I figured that organisms that did well in the oligotrophic parts of the ocean probably had mechanisms to reduce nitrogen usage and an easy place to start reducing nitrogen is by not making so many proteins or at the very least reducing the usage of arginine, histadine, lysine, asparagine, tryptophan and glutamine – amino acids with at least one added nitrogen on their side chains.

This is a good spot to introduce my co-author, Alex Dussaq.

Co-author, Alex Dussaq

Alex completed his honors undergraduate work in mathematics and biochemistry and was working with me on some coding and analysis projects. To follow Matthew’s example, the conversation that started this paper went like this:

Joe: Alex, I have an interesting idea I want to discuss in a proposal… do you think you can download all the GOS data and calculate the nitrogen, C, H and S atoms per residue side chain as in this paper (hand him Baudouin-Cornu) and then correlate those values with chlorophyll (a proxy for phytoplankton and thus primary productivity), NO₃ and Fe. This would be just one figure in the proposal.

Alex: OK, sure that should be pretty easy.

Joe: My proposal is due next week so I need the numbers quickly.

Alex: Yeah, yeah.

Alex codes easier than most people write in their native language. By the way, Alex has moved on to a combined Ph.D./M.D. program at UAB through which he hopes to combine genomics research with new approaches to medicine. I have no doubt he will do unbelievably well in science.

I think that downloading organized data was initially more difficult than it should have been – we spend so much money generating data and so little taking care of it – but we had average values after a few days for several oligotrophic GOS sites and some coastal ocean GOS sites that were convincing enough to put in the proposal. Unfortunately, there are no great metadata – especially physical and chemical characterization of the GOS sites – so we used the “distance to continental land mass” as a proxy for nitrate concentration and oligotrophy (this stung us at first in review). After a week, Alex analyzed all the GOS data and a few important isolated, single organism genomes that factor in the story. After a little less than a month, we had a draft of a two-page brevia that we submitted to Science. It was a simple story that showed data from coastal and open-ocean GOS sites. We found a clear relationship between frequency of nitrogen atoms in side chains of proteins and distance from continental land mass (a proxy for nutrient availability as there are lots of nutrients running off our land). The main conclusion of the paper was that organisms living in oligotrophic oceans tend to have reduced nitrogen content of proteins. Kudos to Alex for some great work.

What was the larger context for the initial findings?

We tried to write the paper from a broader evolutionary and biogeochemical perspective (and used the aforementioned paper by Paul Falkowski as a model). We talked about the implications of organisms in the ocean that are under selective pressure to cost minimize with respect to nitrogen. I’d be happy to share the original submission with anyone who wants to see the evolution of a paper; just contact me. I’d post it here, but Jonathan might charge me for the bytes given how long this is turning out to be. Great reviews make good stories that are decently executed a lot better.

How did the reviewers react?

When reviews of a paper are longer than the original submission, you have an indication that the paper prompted some thought. We received three comprehensive reviews to a two-page paper that contained one main figure and some supplemental material. Given that I didn’t think we could spend time on the subject, we attempted to be brief, too brief especially when compared to the final open access result in ISME. Next, I’ll review some criticisms of the nitrogen cost-minimization hypothesis (having our paper handy will be helpful):

1. Nitrogen cost minimization by simply looking at the predicted proteomes of organisms or environmental genomes assumes that all proteins are made de novo when salvage pathways and dissolved free amino acids (DFAAs) and higher mol. weight/energy compounds are utilized.

Looking at predicted proteomes is indeed a simplification in much the same way that analyzing codon usage frequencies was a simple way to identify with varying degrees of certainty highly expressed genes. No doubt, organisms have multiple methods to acquire the energy they need – especially when under rate-limiting conditions. For example, the pervasive transfer of proteorhodopsin to many different marine microbes presumably helps overcome some nutrient limitation situations by providing added energy from the sun (in the form of a proton gradient), perhaps to aid in transport. The predicted proteome analysis just says that organisms that live in low N waters have lower frequencies of N in their side chains than organisms in the coastal ocean (or in say a sludge metagenome). It doesn’t discount the importance of gene expression, the fact that cells are not “averages” of the genome, etc. None of that really fits into a two-page paper.

2. In our paper, we used the diazotroph Trichodesmium as a model open-ocean organism that was severely N-cost-minimized and compared this to similar success of the SAR11 organism, Pelagibacter ubique. We were criticized because N-fixation should help an organism overcome any N stress.

This was clarified in our next, longer draft. As was shown in the elegant paper by Baudouin-Cornu, assimilatory proteins reflect the “history” of an organism trying to compete for the very atom or molecule they are trying to assimilate. Thus, Trichodesmium would hardly bother to break the triple bond of dinitrogen costing 16 ATP to make ammonia if they were swimming in a vat of inorganic nitrogen. Or put differently, the nitrogenase operon should be nitrogen-cost-minimized reflecting the assimilatory costs of acquiring N. This is, indeed, the case.

3. Why not calculate the bio-energetic costs associated with changes in N content?

We ended up doing this by proxy in the ISME paper. But it raised a far more interesting point that we pursued in further detail and a chicken/egg argument that was pursued subsequently by another reviewer. If you simply plot N atoms per amino acid side chain versus GC, you get a relationship that looks like this:

This is neither surprising nor novel. But it highlights well the “cost” of having a high GC versus low GC genome in terms of added nitrogen atoms in proteins. These data plotted are all marine microbes but the result is universal.

Furthermore, if you plot GC versus median mass of amino acids in the predicted proteome of organisms you get this:

The relationship between GC and the average mass of amino acids is strong. And, this is one of the places where the story gets interesting. Organisms that have low GC genomes have inherently heavier proteins… i.e., All resources being equal and all metabolic pathways being the same (rare, I know), a low GC organism is going to invest more ATP and NADH to make the same protein as a high GC organism. Let’s ignore why this might not matter if you are Helicobacter pylori and quite comfortable acquiring amino acids from your host but focus on ocean microbes. There is a trade-off for all organisms simply based on the GC content of the genome. If you have a low GC genome, you have (on average) larger proteins and less N in your proteins than a high GC genome. Is this trade-off the reason why many of the most successful organisms in the ocean have low GC content? Probably not, but it has to be considered a contributing factor. Constant low nitrogen has to be a major selective pressure given the recent biogeochemical history of the ocean as pointed out in Falkowski (1997). In the final version of the ISME paper, we model differences in the nitrogen budgets of various “model” organisms based on some trade-offs. It was a decent first step, showing that N-cost minimization actually matters.

4. How do you make a quantifiable association between organisms that are so diversely located in space/time and environmental forcing like N availability?

This is a fundamental question in microbial ecology (example, and another). How do we tackle why and when organisms are going to be abundant? Here, I think there are two approaches worth taking. First, what specific genome/metabolic characteristics determine success under specific conditions? For example, what are the characteristics of SAR11 that enable them to “thrive” in oligotrophic waters while their alphaproteobacteria neighbors, the Roseobacter, tend to do better in waters that are more hyper-variable (like the coastal ocean)? Lauro et al. define the characteristics that can be found in genomes of oligotrophic versus copiotrophic organisms. Second, given specific global biogeochemical patterns and environmental forcing constraints, how do we predict organisms will respond? Put in the context of nitrogen cost-minimization, we can ask, “Over geological time will low N waters continue to exert pressure on organisms such that either organisms with N-cost-minimized genomes will thrive or will organisms be forced on a downward GC content trajectory to ease some of this burden?” In our paper, we suggest that the evolutionary history of organisms hints at the impacts nutrient limitations are having on organisms. And this, of course, is by no means new. A beautiful example (albeit not open access).

The divergence of the cyanobacteria Synechococcus and Prochlorococccus during the rise of the diatoms – the most important phytoplankton group in the ocean – suggests the impact of biogeochemical changes on marine microbes. The diversification and proliferation of diatoms in the oceans marginalized cyanobacteria. Diatoms are the workhorses of the ocean biogenic carbon cycle – in comparison to cyanobacteria, they grow quickly and sink faster – thus they sequester fixed CO₂, N and Fe that all other surface ocean microbes need. The diatoms changed the ocean, thus putting pressure on cyanobacteria. A result (because many other things also happened) was the genome streamlining and niche adaptation of the lineage. The best example is the high-light adapted MED4 strain of Prochlorococcus. This particular strain has a small genome, low GC and is nitrogen-cost-minimized, as detailed in our paper. Diatoms marginalized cyanobacteria forcing them into specific niches (e.g., high-light, low Fe, low N, low P) where they are successful and well adapted (like these clades that live in iron poor water).

Where we are heading?

What are the implications of cost-minimization in the genomes of ocean microbes? Could it alter the overall nutrient pools in the surface ocean (and thus affect the potential CO₂ draw down by phytoplankton)? These are questions we are now pursuing using modeling approaches in an attempt to bolster our understanding of biogeochemistry through genomics and microbial ecology. We are teaming up with Jay Cullen, a chemical oceanography professor, good friend and super smart guy to figure out if cost-minimization and other metabolic changes in microbes might be having more of an effect on biogeochemical cycles than we think. Stay tuned.

Ooh — all three at once #baseball

I love MLBs picture in picture function

Story behind the paper: small RNAs in diatom (interview w/ Andrew Allen)

Here is another “Story behind the paper“. This one focuses on the following paper: Norden-Krichmar, T.M., Allen, A.E., Gaasterland, T., Hildebrand, M. (2011) Characterization of the small RNA transcriptome of the diatom, Thalassiosira pseudonana. PLoS ONE 6(8): e22870. doi:10.1371/journal.pone.002870

I wrote some questions up for Andrew Allen, one of the authors. I note I did this before my “new” system of inviting authors to write guest posts directly themselves. Not sure which approach is better but guest posts are certainly easier for me so I will probably do that more.

1. What is the history behind this work? How did it start? Why did you do it?

These studies on small RNA in diatoms are the result of collaboration between my group at the J. Craig Venter Institute (JCVI) and Mark Hildebrand’s group at Scripps Institute of Oceanography (SIO). Each lab group is interested in the ecology, evolution, and physiology of diatoms. More specifically we would like to know more about how diatoms sense and respond to environmental signals. Therefore we are interested in mechanisms of transcriptional regulation in diatoms and other microalgae. An earlier study suggested that cytosine methylation is an important mechanism for repression of transcriptional activity of retrotransposons, and associated mobility, in diatoms. In response to stress, nitrogen stress especially, long terminal repeat retrotransposons (LTR-RTs) display decreased levels of cytosine methylation (hypomethylation) and elevated transcriptional activity.

Mamus, F., Allen, A.E., Mhiri, C., Hu, H., Jabbari, K., Vardi, A., Grandbastien, M.A., Bowler, C. (2009). Potential impact of stress activated retrotransposons on genome evolution in a marine diatom. BMC Genomics 10:624.

Classically small RNAs are known to play a key role in triggering gene silencing by DNA methylation. Also short interfering small RNAs (siRNAs) have been found to play a role in silencing retrotransposons and other repeat elements

Therefore we were interested to investigate the small RNA repertoire of diatoms. Our first experiments were based on 454 sequencing of libraries constructed from small RNA purified from the diatom Thalassiosira pseudonana. It was clear to us that, despite promising results, much deeper sequencing would be required for a meaningful characterization of the small RNA transcriptome. We used ABI SOLiD sequencing to further explore the diversity and expression of small RNAs in T. pseudonana. Although deep sequencing was ultimately necessary to obtain sufficient coverage and resolution for statistically sound analyses the SOLiD and 454 data were remarkably congruent.

At the time these studies were being conducted, 2009, there were some specific challenges associated with analyses of the SOLiD small RNA data. Extraction all types of small RNAs for a non-standard organism was not straightforward.

Initial processing of the SOLiD data using commercial products, such as ABI’s Small RNA Pipeline and CLCbio’s CLC NGS Cell reference assembly software, yielded an average of approximately 6% reads aligned to the T. pseudonana genome. For ABI’s Small RNA Pipeline, even when omitting the filtering step by known miRNAs from the Sanger miRBase, the software gave a higher priority to matching the adapter sequences rather than matching to the genome, in order to produce small RNAs in the miRNA size range. Similarly, because CLCbio’s CLC NGS Cell program was not able to align any sequence less than 27 nucleotides in length, and many small RNAs are in this size range, it also had to be abandoned in this study.

The methodology presented in this study provides the steps necessary to discover all types of small RNA genes in next generation sequence data, and to perform a comparative analysis of different libraries of sequence data. Briefly, an approach was necessary to extract the small RNA sequences from the constant 35 nucleotide colorspace format SOLiD data, convert the colorspace data to its basespace equivalent, and map the sequences to the reference genome. The colorspace data, which is a numerical representation of the color produced during sequencing for each successive two-nucleotide pair, was first converted to its basespace equivalent using CLCbio’s tofasta software. The basespace format sequences were then aligned to the T. pseudonana reference genome with BLAST, acting to simultaneously determine the alignment locations and trim the spurious adapter nucleotides from the ends of the small RNA sequences. This method yielded a recovery rate of 22% of the reads aligned to the genome, which is two or three times more reads than the ABI SOLiD Small RNA pipeline and CLCbio’s NGS Cell program, thereby producing a large data set for further analysis.

2. What is next?

We would like to establish improved conceptual integration for the role of small RNAs in various aspects of diatom evolution, metabolism, and biochemistry. More highly resolved expression patterns of small RNAs in response to specific environmental conditions will be required to make associations between specific small RNA loci and specific cellular processes. It seems likely that copia type retrotransposons play a major role in diatom genome evolution through promoting genome rearrangements and modification of gene expression levels through displacement and insertion of various promoter binding sites. We would like to attain a better understanding of the role small RNAs in mediating transposon occurrence and transcriptional and insertional activity. For example, in relation to retrotransposons, is the role of small RNAs strictly relegated to defense and silencing or do small RNAs also play a role in fostering establishment of transposons that ultimately have a positive impact on fitness?

3. Any interesting stories about the project like fights among authors (OK, maybe not that) – but anything more on the personal side of things?

The lead author of the study Trina Norden-Krichmar, a bioinformaticist, did a lot of the lab work for this project. Diatom culturing, RNA purification, running gels,454 small RNA library construction, PCR, TOPO cloning, Northern blots, etc. are somewhat unusual activity for most bioinformaticians. Interestingly, prior to earning a PhD Trina was a computer programmer who enjoyed open ocean swimming at the La Jolla Cove. As a result of this recreational activity she was motivated to go back to school for a PhD in Marine Biology. Trina also authored a paper on small RNAs in the marine invertebrate Ciona.

http://www.biomedcentral.com/1471-2164/8/445

4. Can you send links to any other information of value including Authors web sites

My JCVI

http://www.jcvi.org/cms/about/bios/aallen

My Mendeley (which has all PDFs mentioned here)

http://www.mendeley.com/profiles/andrew-allen3/

Mark H.

http://sio.ucsd.edu/Profile/mhildebrand

Terry G.

http://genomes.ucsd.edu/

Other papers of interest (e.g., some recent Nature paper by you)

Other recent studies of interest include a publication in Nature earlier this year, Evolution and metabolic significance of the urea cycle in photosynthetic diatoms.

Evolution of intracellular urea synthesis by the ornithine-urea cycle (OUC) is classically known to have facilitated a wide range of physiological innovations and life history adaptations in vertebrates. For example, urea synthesis enables rapid osmoregulation in elasmobranchs (sharks, skates, rays) and bony fish, and ammonia detoxification in amphibians and mammals, which was likely a prerequisite for life on land. Ruminants and some hibernating mammals recycle nitrogen between the liver and gut through urea.

Evolutionarily it was unusual and highly unexpected to find a gene encoding the OUC form of the gene carbamoyl phosphate synthetase (CPS) in diatoms. CPS evolution is evolution is a fascinating story and with many chapters of gene duplication and fusion. Origin of the ornithine-urea cycle can be traced to ancient duplication and subsequent neofunctionalization of ancestral eukaryotic carbamoyl phosphate synthase (CPS); CPSII. CPSII, renamed pgCPS in this study, to reflect function and substrate (pyrmidine metabolism and glutamine) is an ancient eukaryotic enzyme that resulted from fusion bacterial amidotransferase and synthetase subunits. Interestingly there is significant internal similarity within the synthetase domain which is the result of ancient duplication of a kinase domain. It has long been held that pgCPS duplicated in early diverging metazoans to form ugCPS (urea cycle, glutamine) which is targeted to mitochondria. Subsequently, in vertebrates, unCPS (urea cycle, ammonium) appeared and provided foundation for the modern vertebrate urea cycle. Therefore, discovery of unCPS in unicellular stramenopile and haptophyte algae was highly unexpected. Also, physiologically, in animals, the urea cycle is a catabolic pathway that ultimately serves to export fixed nitrogen (in the form of urea) from cells. It was somewhat puzzling and conceptually challenging to imagine a role for the urea within the context of photosynthetic cells. In addition to either glutamine or ammonium CPS utilizes inorganic carbon in the form of HCO₃^– and therefore represents a form of carbon fixation as well. In diatoms, it appears that the urea cycle is the basis for a distribution and repackaging hub for inorganic carbon and nitrogen and is particularly important for redistribution and turnover of cellular nitrogen following episodic pulses of nitrate; which occur during oceanic upwelling events. Although chloroplast and bacterial derived transfer of genes to the diatom nuclear genome have been described, very little is known about the contribution of the secondary endosymbiotic host (exosymbiont) to diatom metabolism. Results of this study indicate that the secondary endosymbiotic host genome made important physiological and biochemical contributions to the diatom nuclear genome sufficient to significantly distinguish secondary endosymbiotic algae from plants and green algae.

Also three studies have been published this year related carbon metabolism and the carbon concentrating mechanism (CCM) of diatoms. The occurrence of efficient CCM(s) in diatoms has long been hypothesized as a result of the relatively high affinity of diatom cells for inorganic carbon compared to much lower affinity of the enzyme RubisCO for CO₂. In other words, in order to overcome RubisCO inefficiencies, such as slow turnover and a propensity to fix O₂(i.e., photorespiration)_,there has been strong evolutionary selection for cellular adaptations that enable elevated CO₂ at the site of fixation by RubisCO. Also over geological time, atmospheric concentrations of CO₂have decreased while O₂has increased; presumably strengthening selection for CCMs in productive modern microalgae.

A manuscript by Hokinson et al published in PNAS is based on mass spectrometric measurements of passive and active cellular inorganic carbon fluxes in wild type and chloroplast carbon anhydrase (CA) over expression cell lines of the diatom Phaeodactylum tricornutum. Carbonic anhydrases (or carbonate dehydratases) are metalloenzymes that catalyze the rapid interconversion of carbon dioxide and water to bicarbonate and protons. Model simulations of these fluxes suggest that, due to membrane permeability to CO₂, only around one-third of the inorganic carbon transported from the cytoplasm into the chloroplast is fixed photsynthetically; and the rest is lost by CO₂diffusion back to the cytoplasm. Therefore in order to achieve the CO₂concentration necessary to saturate carbon fixation it is hypothesized that CO₂is most likely concentrated within the pyrenoid; a specialized non-membrane bound proteinaceous structure within the chloroplast that contains high levels of RuisCO.

Hopkinson, B.M., Dupont, C.L., Allen, A.E., Moreal, F.M.M. (2011). Efficiency of the CO2-concentrating mechanisms of diatoms. Proceedings of the National Academy of Sciences of the United States of America, USA. 108(10):3830-7.

In a paper by Tachibana et al. published in Photosynthesis Research nine and thirteen carbonic anhydrase (CAs) were identified and experimentally localized in the marine diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana respectively. Immunostaining experiments show that PtCA1, a β-CA, is localized to the central part of the pyrenoid in the chloroplast. Other CAs are shown to be localized to the periplastidal compartment, chloroplast endoplasmic reticulum, and mitochondria in P. tricornutum and the stroma and periplasm of T. pseudonana.

Tachibana, M., Allen, A.E., Kikutani, S., Endo, Y., Bowler, C., Matsuda. (2011). Localization of putative carbonic anhydrases in two marine diatoms, Phaeodactylum tricornutum and Thalassiosira pseudonana. Photosynthesis Research. Advance Access published March 2 2011, doi:10.1007/s11120-011-9634-4

A paper published by Allen et al. in Molecular Biology and Evolution (open access) examines the functional diversification of fructose bisphosphate aldolase (FBA) genes in diatoms. Class I and class II FBAs are involved in Calvin-Bensen cycle reaction and glycolysis. Patterns of FBA evolution have been useful for questions related to chloroplast acquisition and evolution in primary and secondary endosymbiotic algae. The universal occurrence of class II FBAs in chromalveolate (diatoms, dinoflagellates, haptophytes and crytophytes) plastids has been interpreted as evidence for chromalveolate monophyly and a single origin for secondary plastid of red algal descent. In this new paper, Allen et al., demonstrate that class I and class II FBAs are localized to the diatom pyrenoid. Class II pyrenoid localized FBA appears to be the result of a chromalveolate specific gene duplication event. The significance of FBA localization in diatom pyrenoids in not fully understood but enzymatic activity and gene transcription appears significantly enhanced under periods of iron (Fe) limitation; when photosynthesis is somewhat down regulated. The authors suggest that pyrenoid localization of some Calvin cycle components might provide a regulatory link between CCM and Calvin cycle activity.

Allen, A.E., Moustafa, A., Montsant, A., Eckert, A., Kroth, P., Bowler, C. (2011). Evolution and functional diversification of fructose bisphosphate aldolase genes in photosynthetic marine diatoms. Molecular Biology and Evolution. Advance Access published September 8, 2011, doi:10.1093/molbev/msr223

Blast from the past: video of a talk I gave in 2006 #metagenomics

Just re-found this video and posted it to youtube. It is from a talk I gave in 2006 at the first “International Metagenomics Meeting” in 2006.

I think one may still be able to view videos from the CalIT2/UCSD page here. But I thought it might be better to have this talk on YouTube than at the CalIT site so I posted it … hope they don’t sue me.

Note – I wrote a blog post about the meeting here:
The Tree of Life: Metagenomics 2006

Once again, using nice "Tree of Life" video from Yale Peabody Museum for #UCDavis Course

I think I have written about this before but here goes again. There is a nice “Tree of Life” video from the Peabody Museum that is now on Youtube and also their web site that is definitely worth a look for people interested in phylogenetics and the tree of life. It includes Michael Donoghue, Scott Edwards, David Hillis, Tandy Warnow and Charles Davis.

Crosspost from http://microbe.net: A very misleading “bacteria in buildings” advertisement presented as “news”

Am crossposting this from http://microbe.net where I posted it earlier. See original post here: A very misleading “bacteria in buildings” advertisement presented as “news”

Wow this “story” (which is really an ad) is just so incredibly bad I do not know what to say: Dangerous Bacteria Isolated in Healthcare HVAC Evaporator Coils. I do not even know where to begin with criticism so I will just go step by step through some of the advertisement.

1. Title: Dangerous Bacteria Isolated in Healthcare HVAC Evaporator Coils

There is no evidence that the bacteria being looked at here are dangerous.

2. First sentence ”A recent study suggests that doctors may want to monitor the environmental condition of their air conditioners evaporator coil before surgery to help prevent the spread of bacterial infections”

No evidence is presented anywhere that monitoring AC coils has any even remote potential value here.

3. Second sentence: Dr. Rajiv Sahay, Laboratory Director at Environmental Diagnostics Laboratory (EDLab) and his colleagues sampled evaporator coils in healthcare air handling systems and isolated Pseudomonas aeruginosa a known noscocomial pathogen.

Well, Pseudomonas aeruginosa is indeed a known pathogen. However, there is no evidence presented that all the things they detect are indeed pathogenic/virulent. In fact, later in the article they report their results as being for “Pseudomonas sp” which suggests that their typing was very broad. It is very possible that many of the cells they detected are not pathogenic.

4. Ignore the middle part. It is just saying that Pseudomonas aeruginosa can be nasty in compromised patients.

5. They then go on to discuss their study more “In the study, over 560,000 colony forming units (CFU)/gram of Pseudomonas sp were isolated from deep within the evaporator coil system.”

What study? No data is presented. No methods. No results. Nothing.

6. They then say “Potential aerosolization of these micro-organisms from the infested coil is immense due to a discharge of air stream with 6 miles/hours (commonly observed) across the evaporator coils”

Not so sure about that. Would have been much better to study ACTUAL aerosolization.

7. Then we find out that they person who conducted the study Dr. Rajiv Sahay is also the one selling the cleaning service to clean your air coils. That does not instill confidence in me.

So a person selling HVAC cleaning reports unpublished results that they claim suggest if you do not clean your HVACs in hospitals you put all your patients at risk. I am on board with the need to study microbes in hospitals more. I am on board with the potential risks of microbes in AC systems. I am not on board with not presenting data, and with getting the science wrong.

Special Guest Post & Discussion Invitation from Matthew Hahn on Ortholog Conjecture Paper

I am very excited about today’s post. It is the first in what I hope will be many – posts from authors of interesting papers describing the “Story behind the paper“. I write extensive detailed posts about my papers and also have tried to interview others about their papers if they are relevant to this blog. But Matthew Hahn approached me recently about the possibility of him writing up some details on his recent paper on the functions of orthologs vs. paralogs. So I said “sure” and set up a guest account for him to write up his comments and details of the paper.

For those of you who do not know, Matt is on the faculty at U. Indiana. He was a post doc at UC Davis so I have a particular bias in favor of him. But his recent paper has generated some controversy (I posted some links about it here). So it is great to get some more detail from him. In addition, I note, I am also using this approach to try and teach people how easy it is to write a blog post by getting them guest accounts on Blogger and letting them write up something with links, pictures, etc. So hopefully we can get more scientists blogging too.

Anyway – without any further ado – here is Matt’s post:

———————————————————————–
Following Jonathan’s excellent example of how explaining the history of a project helps to illuminate how the process of science actually happens, I thought I’d start by giving a bit of history behind our study, and the paper that we recently published in PLoS Computational Biology (http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002073). And then I’ll address the critics…

How this all got started

It all started a bit more than three years ago, in the summer of 2008. Pedja (as Predrag Radivojac is known to friends) was giving a talk to a group of us on protein function prediction that he also presented as a tutorial at the Automated Function Prediction SIG at ISMB 2008. Pedja and I had already collaborated on a small project involving the evolution of phosphoryation sites, but I really had no idea about his work on function prediction, and little idea in general about how function prediction was done. Reviewing different ways to accomplish transfer-by-similarity, he eventually got around to evolutionary (phylogenomic) approaches. Here is what I remember of this specific exchange during his talk:

Pedja: …and of course these methods only use orthologs for prediction, because orthologs have more similar functions than do paralogs.

me (from audience): Who says?

Pedja: Umm, you say. I mean, the evolutionary biologists say.

me: No, we don’t. I don’t know of any data that says any such thing.

Pedja: Whatever, Matt. We’ll talk about this later.

Well, we did talk about it later, and it turned out that although this claim is made in tons of papers, there is basically no data to support it. In the best cases a real example of one gene family will be cited, but there are very few of these. In the worst cases, the authors will just cite some random paper about gene duplicates (or Fitch’s original paper defining orthologs and paralogs). Of course I agree that patterns of sequence evolution might lead you to conclude this relationship was true, but there was no experimental data.

In fact, as we say in our paper, rarely did anyone recognize that this claim needed to be tested, or even that it was a claim that could be tested. At the time Eugene Koonin was the only person to say this: “The validity of the conjecture on functional equivalency of orthologs is crucial for reliable annotation of newly sequenced genomes and, more generally, for the progress of functional genomics. The huge majority of genes in the sequenced genomes will never be studied experimentally, so for most genomes transfer of functional information between orthologs is the only means of detailed functional characterization” (http://www.ncbi.nlm.nih.gov/pubmed/16285863). I really liked the way that Eugene had said this, and started to refer to the idea that orthologs were more functionally similar than paralogs as the “ortholog conjecture.” So to be clear: I completely made up this phrase, but used the most evocative word from the Koonin paper.

Luckily for Pedja and me we had just gotten a small internal grant to work on genome annotation and we had an incoming master’s student (Nathan Nehrt) who was willing to work on a project intending to test the ortholog conjecture.

Interlude: the crappy state of things in the study of the evolution of function

In order to test anything about how function evolves between orthologs and paralogs—or between any genes—one of course needs some kind of data on gene function in multiple species. And this turns out to be a big problem.

Because, as Koonin says in the earlier quote, the vast majority of experimental data comes from a very few species, and these species are not exactly closely related. Here is an approximate phylogeny of the major eukaryotic model organisms:

It’s obvious from this figure that if you need both 1) lots of functional data from two species, and 2) a pretty good idea of exactly what the homologous relationships are between the genes you’re studying, you’re going to have to study human and mouse.

This is actually a pretty bleak picture for people who study molecular evolution (as I do). While we have tons and tons of sequence data both within and between species, and a very good idea about how these sequences evolve, and fancy models with which to analyze these sequences…we know next to diddly-squat about general patterns relating these sequence differences to functional differences. There are lots of interesting things to be gleaned from studies of sequence evolution, but it really would be nice to know something about the relationship between sequence and function.

What we found

What exactly does the ortholog conjecture predict? In my mind, at least, it predicts something like this:

In this completely fictitious graph the relationship between protein function and sequence similarity is a declining one, only it declines faster for paralogs than it does for orthologs. Also, just possibly, gene duplicates start out with slightly diverged function the minute they appear. Anyway, those were our predictions.

But here is what we found (Figure 1 in Nehrt et al. 2011):

(Panel A uses the Biological Process ontology and panel B uses the Molecular Function ontology.)

There are really two different, equally surprising results here. First, there is no relationship between sequence divergence and functional divergence for orthologs (among 2,579 one-to-one orthologs between human and mouse). Absolutely none—it’s a straight horizontal line. Second, there is a relationship for paralogs (among 21,771 comparisons), exactly as we predicted there would be. So according to our results, paralogs actually have more conserved function than do orthologs. Our interpretation of the data was that the most important determinant of function was the organismal context in which a gene/protein found itself: given the same amount of sequence divergence, two proteins in the same organism would be more functionally similar. For orthologs, this means that the sequence divergence of our target gene was not the most important thing, but rather the sum total of divergence in all of the genes that contribute to its cellular context. Which is why all the orthologs have on average similar functional divergence—they are all exactly the same age and hence have approximately the same levels of divergence in these interactors (in this case sequence divergence for paralogs is a much better indicator of their splitting time).

Without going through every result in the paper and our interpretation of every result, suffice it to say that after about a year-and-a-half of working on this (around February 2010), we were satisfied that we had something we were willing to submit. I even seem to remember showing the above figure to Jonathan on a visit to UC-Davis! So we did submit the paper, first to PNAS and then, after rejection, to PLoS Computational Biology, where it was rejected again.

The content of the reviews was approximately the same at both journals. Basically, people were not convinced of our results mostly because the functional relationships were all based on data in the Gene Ontology database. To be specific, the functional data we used came from experiments conducted in 12,204 different papers. We didn’t use any predicted functions, only functions assigned using experimental data. And we did A LOT of work to try to eliminate problems that might have affected our results, including repeating the main analysis using only GO terms common to both the human and mouse datasets. But there can still be bias hidden within these functional assignments because someone always has to interpret the experiment—to say that a yeast two-hybrid experiment means that a gene has function X. And because of these biases, people weren’t buying it.

To get a measure of functional similarity that did not depend on the interpretation of any experiments, we decided to repeat the entire analysis using microarray data, using the correlation in expression levels across 25 tissues as the measure of functional similarity. By this time Nathan was graduating and moving on to Maricel Kann’s lab as a research programmer, so we recruited one of Pedja’s Ph.D. students, Wyatt Clark, to pick up where Nathan had left off. (Wyatt had actually been a student in my undergraduate Evolution course a few years earlier, so we figured he knew something…) After repeating all of the GO-based analyses himself—always better to double-check, right?—Wyatt got all of the microarray data in order and produced this figure (Figure 4 in Nehrt et al. 2011):

So a year after we first submitted a paper, we submitted a new version to PLoS CB including the array analysis, and this was enough to convince the reviewers that our results were not merely due to some strange bias in GO.

The fallout, and some responses

First, let me say that I had some idea that this would be a controversial-ish paper, and that we’d get at least some blowback. For about the first 20 versions of the manuscript (including some submitted versions) I put the words “ortholog conjecture” in quotes in the title, never an endearing move. (Pedja finally convinced me to take them out of the latest submissions.) But I also thought people would be happier that an untested assumption had finally been tested—and we have definitely gotten some positive feedback along these lines, including several groups that told us they have data that support our findings. By coincidence my lab had another paper come out the same week as this one (http://www.ncbi.nlm.nih.gov/pubmed/21636278), and I mistakenly thought it would generate much more attention. I still think the biological importance of the results in that one are much greater than the ortholog conjecture results, but either because we didn’t publish in an open-access journal (Jonathan is always right) or simply because the function-prediction community is more active on the interweb tubes, there have been a surprising number of critical responses (partially collected here: http://phylogenomics.blogspot.com/2011/09/some-links-on-ortholog-conjecture-paper.html). So here are some responses to general critiques.

The ortholog conjecture says only that orthologs are similar.

Okay, this one is a bit unfair, as only one person has said this. The real problem here is that Michael Galperin seems to have deeply misunderstood what we mean by the ortholog conjecture. According to him the ortholog conjecture is “the assumption that orthologs (genes with a common origin that were vertically inherited from the same gene in the last common ancestor of the host organisms) typically retain the same function or have closely related ones.” Umm, no. In fact, if you really think this is what the ortholog conjecture says, then our results support it—human and mouse orthologs do typically have closely related functions. But we are explicitly testing for a difference between orthologs and paralogs, not whether or not orthologs retain any functions. At no point did we say (or even hint) that orthologs should not be used for functional prediction. The whole point of our analysis and conclusions is that we should stop ignoring paralogs, which would give us a ton more data to use for the prediction of functions.

The assignments of orthology and paralogy are incorrect.

This is an easy one: we did in fact get the definitions of in- and out-paralogs correct (and laid them out in Figure S1). According to Sonnhammer and Koonin: “Our definition of ‘outparalogs’ is: paralogs in the given lineage that evolved by gene duplications that happened before the radiation (speciation) event” (http://www.ncbi.nlm.nih.gov/pubmed/12446146). For the purposes of our study, this means that outparalogs are defined as any paralogs that diverged before the speciation event between human and mouse and inparalogs diverged after this speciation event. Outparalogs do not indicate only paralogs in two different species, though by necessity in our dataset inparalogs are only found in the same species (all in human or all in mouse). Therefore, with respect to our conclusion that the most important determinant of function is which genome you are found in (i.e. context), it wouldn’t matter if we had incorrect gene trees: we would never confuse two genes in the same species (either inparalogs or some of the outparalogs) with two genes in different species (all orthologs and the remaining outparalogs).

You should have inferred functions yourselves

This is a fair suggestion, and not having enough time to annotate functions for 40,000 proteins would be a pretty weak excuse for doing good science. Instead…I’ll just say that it turns out professional curators are much better at assigning functions than even the original study authors (see http://www.ncbi.nlm.nih.gov/pubmed/20829821). Curators have a much broader view of the whole set of terms available in any ontology, and a much more consistent idea of how to apply these terms. My favorite line from the above cited article: “…because of the relatively low accuracy of the authors’ submissions, the use of authors’ annotations did not result in saving of curators’ time…”

GO is not appropriate for this analysis because it is biased.

This is the most frustrating criticism of our study, if only because it’s partly true: GO is biased. In our paper we actually detail several of these biases, including the observation that the same set of authors will give two proteins more-similar functions than will two different sets of authors. We tried very hard to attempt to control for these biases, though of course one cannot account for all of them. The most uncharitable part of this critique, however, has to be the fact that people conveniently forgot to say that our array analysis was completely distinct from the GO-based analysis (though it has its own issues), and that Burkhard Rost’s analysis of protein-protein interaction (http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0020079) was also completely free of any bias in GO and was consistent with all of our results.

More annoying than this, you’d think from some of the critiques of GO that it was some sort of fly-by-night operation that no one should ever depend on. I mean, c’mon—there are human curators and human experimenters and of course they’re all biased so badly one could never compare functions between proteins much less between species. What were we thinking? (Only that the original GO paper has been cited >7000 times.) Funnily enough, at several points during the course of this work Pedja suggested—only half-jokingly—that we should just assume the ortholog conjecture was correct and write a paper about how GO must be wrong. Seriously, though: one would think from the excuses people came up with for the problems inherent in GO that we should simply stop using it to, you know, predict function in other species. And we were applying it to two relatively closely related mammals, one of which is explicitly a model for the other.

What next?

Our paper laid out several explicit hypotheses about the evolution of function that arose from our findings. Unfortunately, testing any of these hypotheses will require a ton more functional data, in more than one species. I know there are multiple groups working to collect these sorts of labor-intensive datasets, and Pedja and I are thinking about doing it ourselves (with collaborators, of course!). Massive datasets that reveal protein function will always be a lot harder to collect than sequence data, especially ones free from biases.

So let’s get to it…

—————————

Note – Toni Gabaldón was trying to post a detailed response but Blogger kept cutting him off with a character limit. So I have posted his response below.

I appreciate the effort by Matthew Hahnn on explaining the story behind his paper on the so-called “Ortholog conjecture” and on facing some of the criticism. This paper attracted my interest as that of many others that work on or just use orthology. For instance it was chosen by one of my postdocs for our “Journal Club” meeting. And it was discussed during our last “Quest for Orthologs” meeting in Cambridge. I think is raising a necessary discussion and therefore I think is a good paper. This does not mean that I fully agree with the interpretation and conclusions ;-). I hope to modestly contribute to this debate with the following post.

I think one of the causes that this paper has caused so much debate is that the conclusions seem to challenge common practice (inferring function from orthologs), and could be interpreted as the need of changing the strategies of genome annotation. I think, however, that one should interpret carefully these results before start annotating based on paralogous proteins. As I will discuss below one of the problems is that we need to agree in what is the conjecture to then agree in how to test it. I see three main points that can be a source of confusion: i) the issue of what is actually stated by this conjecture, ii) the issue of annotation, and iii) the issue of time

1) What is the “ortholog conjecture”?
Or in other terms, when should we expect orthologs to be more likely to share function than paralogs?. Always? Of course not. All of us would agree that two recently duplicated paralogs are likely to be more similar in function than two distant orthologs, so it is obvious that the conjecture is not simply “orthologs are more similar in function than paralogs”. In reality the expectation that orthologs are more likely to be similar in function than paralogs, as least this is how I interpret it, is directly related to the effect that duplication have on functional divergence. If gene duplication has some effect on functional divergence (even in not 100% of the cases), then, given all other things equal (divergence time, story of speciation/duplication events – except fpr the duplication defining the orthologs) one would expect orthologs to be more likely to conserve function.

I think this complexity is not well considered (by many authors, in general). Hahn refeers to the famous review of orthology by Koonin (2005) as the source for the term “ortholog conjecture”. However, In that paper this conjecture is discussed always within the context of genes accross two particular species, whether in Hahn’s paper it is taken as well to other contexts. Thus, the proper context in which to test this conjecture is only between orthologs and between-species paralogs. As we can see, Red and purple lines in Hahn paper in figure2 do not show any clear difference.

Secondly, Koonin was very cautions in his paper, stating that he was referring to “equivalent functions” and not exactly the same “function”, correctly implying that the functional contexts would be different in the two different species. This brings me to the next point.

ii) annotation
If the expectation of functional conservation of orthologs refers to a given pair of species, then it makes no sense to test that expectation between paralogs within the same species and orthologs in different species. We were interested in this issue and it took us some effort to control for this “species” influence on the comparison, if you are interested you can read our paper on divergence of expression profiles between orthologs and paralogs (http://www.ncbi.nlm.nih.gov/pubmed/21515902)

As Hahn founds, and it was anticipated by Koonin in that review, there is a huge influence of the “species context”, a big constraint of what fraction of the function is shared. Indeed I think is the dominant signal in Hahn’s paper. Why is that? One possibility is that the functional context determines the function, I agree. However, we should not discard biases in how different communities working around a model species define processes and function, also the type of experiments that are usually done. For instance experimental inference from KO mutants might be common from mouse, but I guess is not the case in humans (!!). I think this may be having a big influence and might even be the dominant signal in Hahns paper.

Finally function has many levels and I expect subfunctionalization mostly affect lower levels (i.e. more specific). Biases may also
exist in the level of annotation between species or between families of different size (contributing more or less to the orthologs/paralogs class).

Microarray data are less likely to be subject to biases (although some may exist), at least they should be expected to be free of “human interpretation biases” and so Hahn and colleaguies did well, in my opinion, of testing that dataset. It is important to note that for microarrays and for orthologs and between-species paralogs (which I think is the right frame for testing the conjecture) ortholgs are more likely to share an expression context. This is compatible to what we found in the paper mentioned above, and compatible with the orthology conjecture as stated by koonin (accross species)

iii) time
Finally, one aspect which I think is fundamental is the notion of “divergence time”. Since paralogs can emerge at different time-scales they are composed by a heterogeneous set of protein pairs. Most of comparisons of orthologs and paralogs (Hahn’s as well) use sequence divergence as a proxy of time. However this is only a poor estimate, specially when duplications (as in here) are involved (we explored this issue in the past: http://www.ncbi.nlm.nih.gov/pubmed/21075746). This means that for a given divergence time paralogs may have larger sequence divergence than orthologs at the same divergence time, or otherwise (if gene conversion is playing a role). Is the conjecture based on sequence divergence or on divergence time?, I think the initial sense of using orthology to annotate accross species is based on the notion of comparing things at the same evolutionary distance. Thus basing our conclusions on divergence times might not be the proper way of doing it.

CONCLUSIONS AND PROPOSAL FOR RE-STATEMENT

To conclude, and with the intention of going beyond this particular paper,
I would finish by saying that the key to the problem lies on how we interpret the so-called “ortholog conjecture” or how are our expectations on how function evolves. What I get from re-reading Eugene Koonin’s paper and how I am using that “assumption” in my day-to-day work is the following:

“Orthologs in two given species are more likely to share equivalent functions than paralogs between these two species”

Therefore the notion of “accross the same pair of species” is important and thus only part of the comparisons made by Hahn and colleagues could directly test this. Looking at the microarray and between-species comparisons data, the conjecture may even hold true!!

I, however, do think that the conjecture as stated above is limited and does not capture the complexity of orthology relationships. Indeed us, and many other researchers, are tuning the confidence of the orthology-based annotation based on whether the orthologs are one-to-one, one-to-many or many-to-many, even when orthologs are “super-orthologs” (with no duplication event in the lineages separating the two orthologs).

Since, the underlying assumption of the ortholog conjecture is that duplication may (not necessarily always) promote functional shifts, then many-to-many orthology relationships will tend to include orthologous pairs with different functions.

Thus I would re-state the conjecture (or expectation) as follows:

“In the absence of additional duplication events in the lineages separating them, two orthologous genes from two given species are more likely to share equivalent functions than two paralogs between these two species”

This would be a more conservative expectation, which is closer to the current use of orthology-based annotation that tends to identify one-to-one orthologs, rather than any type.

When duplications start appearing in subsequent lineages thus creating one- or many-to-many orthology relationships, the situation is less clear. Following the assumption that duplications may promote functional divergence. Then one could expand the conjecture by “the more duplications in the evolutionary history separating two genes, the lower the expectation that these two genes would share equivalent functions”.

I wrote this contribution on the fly, and surely there are ways of expressing this in more appropriate terms. In any case I hope I made clear the idea that the conjecture emerges from the notion of duplications causing functional shifts and that our expectations will be clearer if expressed on those terms. This goes on the lines of what Jonathan Eisen mentioned on considering the whole phylogenetic story to annotate genes.

Under this perspective, the real important hypothesis is that “duplications tend promote functional shifts”, I think this is based on solid grounds and has been tested intensively in the past.

Cheers,

Toni Gabaldón

http://treevolution.blogspot.com

Interested in sex? How about in bacteria? Then these #PLoSGenetics papers are for you

Well I was torn about this. Should I title the post ” ICE, ICE, Bacterial BABIES” or say something about sex? I settled on sex, but not sure if that was wise.

Anyway – quick post to say that there are two papers from PLoS Genetics last month that caught my eye. They are

PLoS Genetics: The Repertoire of ICE in Prokaryotes Underscores the Unity, Diversity, and Ubiquity of Conjugation. Guglielmini J, Quintais L, Garcillán-Barcia MP, de la Cruz F, Rocha EPC (2011) PLoS Genet 7(8): e1002222. doi:10.1371/journal.pgen.1002222
PLoS Genetics: A Broad Brush, Global Overview of Bacterial Sexuality Achtman M (2011) PLoS Genet 7(8): e1002255. doi:10.1371/journal.pgen.1002255

The latter is a “review” paper linked to the first one which is a research paper. The papers together provide both a good background and a window into modern studies of “ICEs” or integrative conjugative elements in bacteria.

I like the summary from the first paper:

Some mobile genetic elements spread genetic information horizontally between prokaryotes by conjugation, a mechanism by which DNA is transferred directly from one cell to the other. Among the processes allowing genetic transfer between cells, conjugation is the one allowing the simultaneous transfer of larger amounts of DNA and between the least related cells. As such, conjugative systems are key players in horizontal transfer, including the transfer of antibiotic resistance to and between many human pathogens. Conjugative systems are encoded both in plasmids and in chromosomes. The latter are called Integrative Conjugative Elements (ICE); and their number, identity, and mechanism of conjugation were poorly known. We have developed an approach to identify and characterize these elements and found more ICEs than conjugative plasmids in genomes. While both ICEs and plasmids use similar conjugative systems, there are remarkable preferences for some systems in some elements. Our evolutionary analysis shows that plasmid conjugative systems have often given rise to ICEs and vice versa. Therefore, ICEs and conjugative plasmids should be regarded as one and the same, the differences in their means of existence in cells probably the result of different requirements for stabilization and/or transmissibility of the genetic information they contain.

That should be enough to get people started. And that is alas all I have time to write about here.

OK I now officially love BioITWorld; on the cover, in my #Redsox #PLoS1 shirt, called “Ace Pitcher”

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: