An influential article in my career development was this piece on the Washington Post in 1996 by Stephen Jay Gould. I was already convinced bacteria were important and interesting. But it was nice to see the person who got me interested in evolution (via his books and then a class I took from him in college) emphasizing the bacteria. Here is a link to the Post archive of it.
PLANET OF THE BACTERIA – The Washington Post
Well, my mom sent me a copy of it and I kept it all these years. Just scanned it so, I thought I would share what it looked like in the paper since this is VERY different from looking at the text on the Post archive site.
I like the last part too – an ad for the American Society for Microbiology that went with the article.
As part of my NSF Research Coordination Network grant (RCN EukHiTS), I am currently managing a number of Mendeley groups that amalgamate relevant journal articles on different topics related to environmental PCR, metagenomics, and microbial eukaryotes. These groups are open (anyone can join with a Mendeley account), and I’m trying to keep them regularly updated with new articles (Mendeley members can also add articles, which I strongly encourage!):
- Eukaryotic HTP Studies – Publications relevant to high-throughput environmental sequencing approaches focused on microbial eukaryotes. Articles will include any type of -Omic methods (marker gene amplicons, metagenomics, metatranscriptomics, etc.), eukaryote-focused tools/pipelines, and review/opinion pieces.
- rRNA in Eukaryotes – Literature related to the ribosomal repeat array in eukaryotic genomes – variation in rRNA gene copy number, intragenomic polymorphisms, concerted evolution, transposable elements and their evolutionary and ecological implications.
- Environmental PCRs – primer sets and bias – Literature related to primer set usage and bias across all taxonomic groups (bacteria, archaea, fungi and microbial eukaryotes) – includes primer sets and methods focused on 16S, 18S, ITS, other rRNA, COI, and other marker genes used for environmental sequencing.
- eDNA in aquatic ecosystems – This group focuses on environmental DNA (eDNA) applications in aquatic ecosystems, include use of eDNA in bioassessment and environmental monitoring. Literature collection covers methods, analytical tools, and empirical studies (both basic and applied science).
Below is a guest post from Kevin Penn, who used to work in my lab …
I am a former Research Associate of Jonathan’s interested in understanding evolution and ecology of microbes in natural environments. Recently I’ve become interested in learning about the expression of secondary metabolite related genes in natural settings to put the gene’s products into an ecological context, because almost certainly microbes are not making natural products just to benefit humans. I am currently studying these topics as a post-doc in Janelle Thompson’s lab at MIT.
When I got to MIT there was a set of paired end Illumina HiSeq data from six time points collected over one day night cycle from the Kranji Reservoir in Singapore, which was experiencing a cyanobacterial Harmful Algal Bloom (cyanoHAB). Note algal in this case means bacterial, I used to argue that this is taxonomically incorrect but used colloquially I think it works. These samples are what the paper “Secondary metabolite gene expression and interplay of bacterial functions in a tropical freshwater cyanobacteria bloom” is based on. MIT has a program in Singapore called Center for Environmental Sensing and modeling/Singapore MIT Alliance (CENSAM/SMART) and one of the projects is to learn about microbial populations associated with the drainage and reservoirs over the city/state/country. The motivation for the study (Penn, et al 2014) is based on two observations. 1) The idea to sample a day night cycle of a harmful algal bloom derived from experiments done for marine Prochlorococcus showing major changes in gene expression in the evenings and morning and more similar profiles at noon and midnight (Zinser, et al 2009). 2) An initial sample collection and analysis for this study did not readily detect genes for the toxin microcystin from drainages around the reservoir catchment (Nshimyimana, et al 2014) indicating the Cyanobacterium was growing in the reservoir (i.e. not being flushed in). We knew the bloom in the Reservoir was dominated by Microcystis aeruginosa but now we wanted to learn if microcystin toxin genes were expressed in the reservoir and if so were they expressed around the clock.
Harmful algal blooms are of concern because they appear to be increasing in frequency on a global scale. HABs are not only eyesores they also produce toxins that make lakes unusable for drinking water and recreation. For a good introduction to HABs I suggest reading an excellent book “The algal bowl: overfertilization of the world’s freshwaters and estuaries” by David W. Schindler & John R. Vallentyne. But I should note there are probably thousands of books written on the subject. Below you can see what our study site looked like during a bloom with a surface scum visible and during conditions where the water is a bit more clear (post bloom).
Polyketide synthases (PKS) and Non-ribsomal peptide synthetases (NRPS)
The search for expression of microcystin toxin genes is also a part of my larger interest to learn about the expression of PKS and NRPS genes in natural settings. PKS and NRPS derived molecules represent a large class of natural products famous for being toxins and used as medicine to treat human disease. Two phyla of bacteria are historically known for their production of these compounds (Actinobacteria and Cyanobacteria). For example the PKS and NRPS derived microcystin toxin is produced by M. aeruginosa and members of the Phylum Actinobacteria produce the potent antibiotic rifamycin. The expression and presence of most PKS and NRPS pathways in natural settings is currently not very well understood.
Prior to this work it was not clear that bacterial PKS and NRPS pathways are expressed in natural settings. The products of the microcystin pathways are present in harmful algal blooms (thus the term Harmful). This made Kranji Reservoir a good system to study because we should observe the transcripts for microcystin. PKS and NRPS genes can be highly repetitive and similar between different pathways so we were not sure we find them with Illumina type sequencing. Based on my initial tests using a tool called NaPDoS, which I helped developed at Scripps to quickly identify sequence tags from PKS and NRPS gene pathways, it was clear we could see the expression of many different pathways in our data. This spurred me on to look at the differences in expression over time. The examination of the time series revealed that there appears to be a rhythm to expression of PKS and NRPS genes and that strikingly, one of the most highly expressed PKS/NRPS gene cluster in M. aeruginosa has not been linked to a molecule. This is especially interestingly from an ecological perspective, as one of the most highly expressed PKS/NRPS pathways have yet to be associated with a product.
One of the cool things about science is that it can be predictive. Within an experiment of photosynthetic bacteria then you would hope that your expression data reflects the idea that photosynthetic life uses light to photosynthesize and that the genes that code for the machines that harness light would be most highly expressed during the day. We call that, the the “sanity check,” and it came out very nicely in our metatranscript data; showing that photosynthesis related genes cycle in the environment and are highly correlated with the day night cycle. Our observation that the things we expected to be highly expressed were highly expressed gave us confidence that our data may have other patterns that we would not necessarily think to look for. We started to look at broader categories of function genes for the top four phyla. From this analysis we noticed that some phyla were enriched for particular genes relative to other phyla, which in turn allowed us to make some ecological predictions in relation to how each group, might be functioning in the bloom community. For example look at figure 4and 5 in the paper and you can see that Actinobacteria are mostly transporting photosynthetically derived carbohydrates but Bacteroidetes groups are mostly transporting peptides furthermore groups within the proteobacteria are expressing most of the motility and chemotaxis related genes.
Quantifying natural microbial communities remains a significant challenge and more importantly identifying ecological functions for phenotypes promises to provide microbial evolutionary biologist with crucial data to learn about the evolution of bacteria. Imagine trying to study the evolution of a hand if you had no idea of the ecological function for the hand.
Problem Solving- paired end reads
One of the important decisions we had to make for us to start the analysis of Illumina data center on the state of paired end sequencing in metatranscriptomics. Paired end sequencing is a great boon for Illumina sequencing and Illumina sequencing created a huge opportunity for the field of metagenomics. But paired Illumina reads that do not collapse into one can represent a large portion of an Illumina sequence run despite efforts to create short enough sequences to have overlap and yet make the fragments large enough to make paired end sequences more informative. Paired ends can complicate issues because they may represent two genes but one operon, or two genes from different operons which is a problem for analysis trying to assign function to reads. The other issue is that in assigning taxonomy to reads by chance alone similar sequences although part of a pair may match different organisms. MEGAN tries to deal with this by increasing bit scores for sequences that match the same thing. We made the decision to use paired information to improve the confidence in function assignment in MEGAN if both reads hit the same gene, and treated 1 and 2 reads as separate for counting total reads matching a gene if the read counts were not to be normalized to gene length. Another aspect of the study focused on calculating expression for genes from the bloom former M. aeruginosa using RPKM which does take into account gene length thus we decided to treat the 1 and 2 reads as technical replicates for calculating RPKMs and averaged the values.
This experiment has given us the first glimpse at expression of toxin genes in a natural setting and provided us with some clues of microbial phylum level interplay. The next experiment to further test our observations includes a greater sampling effort over two day night cycles at a greater frequency and with replicates and sampling at the surface and subsurface. This work is being done in collaboration with another research group interested in Microcystisand harmful algal blooms at the National University of Singapore led by Prof. Karina Gin. It is known that M. aeruginosa strains migrate up and down in the water column and we want to check to see if some of our cyclic observations relate to the presence of different strains present on the surface throughout the day. A follow up study in progress is to look at the reservoir community during non-bloom conditions and run perturbations to identify the effects of the addition of nitrate, phosphate, and microcystin on the microbial community in hopes to learn if there are expression patterns that show how Microcystisis able to bloom.
The exact story behind the paper will be better understood if it is supplemented with a brief background about my introduction to Genomics and microbial ecology which mainly occurred after starting work as a Research Associate for Jonathan. Looking back “many years ago” I had just finished up an undergrad degree at UCSB in Aquatic biology and I was looking for a job as a scientist when I met Jonathan. It was really my first meetings with Jonathan that have set my way forward in research. I wanted to learn about how things evolve and the ecological functions of traits and Jonathan wanted to understand how all life evolved which meant he was studying the genomes of microbes. In our first meeting we discussed how genomics and methods associated with genomics namely 16S rRNA gene community studies were going to allow us to learn all about microbial ecosystems and even allow us to do insitu ecological studies of microbes (the term metagenomics was not widely known or used at this time). As TIGR slowly evolved into JCVI, I began my move to grad school to work in Paul Jensen’s lab at Scripps Institute of Oceanography who had recently sequenced the genome of a couple species of marine actinomycetes. In grad school I spent a lot of time learning about natural products and the genomes of famous group of organism called Actinomycetes, which make about 80% of the antibiotics we take today. By the time I finished grad school I had become acutely interested in learning about the expression of natural products related genes in a natural setting.
Our latest paper published in ISME reflects a combination of my exposure to some very different fields of scientific research, from studying genomics and community diversity at The Institute for Genomic Research (TIGR) to my PhD work in natural products research at Scripps and now my studies on community gene expression dynamics in Harmful Algal blooms at MIT. I have been researching the ideas about insitu microbial ecology that Jonathan discussed with me those many years ago and continue to expand our knowledge about what microbes are doing in natural setting in this paper.
Of course I did not do this paper in a vacuum at MIT. Prof. Janelle Thompson organized the data collection, co-wrote the paper and taught me a lot about the appropriate statistics we needed to use to analyze our data and interpret the results. Graduate students Tim Helbig and Sonia Timberlake helped me get going on the computer clusters here at MIT. One of my favorite parts of moving institutions is learning the in and outs of new computer clusters. I have been funded as a postdoctoral associate at MIT and subsequently by the NSF post-doctoral fellowship intersection of math and biology during this research. Singapore CENSAM/SMART has supported our travels to Singapore along with sequencing costs.
Quick post here. This paper came out a few months ago but it was not freely available so I did not write about it (it is in PNAS but was not published with the PNAS Open Option — not my choice – lead author did not choose that option and I was not really in the loop when that choice was made).
Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. [Proc Natl Acad Sci U S A. 2013] – PubMed – NCBI.
Anyway – it is now in Pubmed Central and at least freely available so I felt OK posting about it now. It is in a way a follow up to the “A phylogeny driven genomic encyclopedia of bacteria and archaea” paper (AKA GEBA) from 2009 with this paper a zooming in on the cyanobacteria.
Well, this is one of the bigger screw ups in terms of evolution I have seen at a major journal in a while. See the following paper in Nature: The catalytic mechanism for aerobic formation of methane by bacteria : Nature. The paper discusses some functions of “the ocean-dwelling bacterium Nitrosopumilus maritimus.” Some of what is reported in the paper is perhaps interesting (alas I do not have access). But painfully, there is one big big big big mistake – you see Nitrosopumilus maritimus is not a bacterium. It is an archaeon (see for example this paper on its genome).
I got pointed to this by Uri Gophna (in an email and in a comment on my blog)(all see this on Twitter) Sure – some people debate the structure of the tree of life. But I am pretty certain the authors here (Siddhesh S. Kamat, Howard J. Williams, Lawrence J. Dangott, Mrinmoy Chakrabarti & Frank M. Raushel) are not trying to make a statement about monophyly of bacteria or just what archaea are. They just made what seems to be a colossal screw up. And Nature not only let them, but added to it with things like their “Editors Summary”:
Novel bacterial biosynthesis of methane
Aerobic marine organisms produce significant quantities of the potent greenhouse gas methane, much of it via the cleavage of the highly unreactive carbon–phosphorus bonds of alkylphosphonates. In this study the authors explore the mechanism of PhnJ, an unusual radical S-adenosyl-L-methionine (SAM) enzyme that appears to use a cysteine-based thiyl radical to help catalyse the conversion of the alkylphosphonate substrate to methane and ribose-1,2-cyclic phosphate-5-phosphate. This reaction, not previously encountered in biological chemistry, establishes a novel mechanism for cleaving carbon–phosphorus bonds to form methane and phosphate via a covalent thiophosphate intermediate.
And for this taxonomic alchemy (converting an archaeon to a bacterium) I am awarding them and Nature my coveted “Twisted Tree of Life Award #16″.
UPDATE 5/28 7AM
I love the ad that came up while I was writing this post and searching for some information. I think Nature could use the services from this ad:
Below is a guest post from my friend and colleague Kimmen Sjölander, Prof. at UC Berkeley and phylogenomics guru.
Announcing the FAT-CAT phylogenomic annotation webserver.
FAT-CAT is a new web server for phylogenomic prediction of function and ortholog identification and for taxonomic origin prediction of metagenome sequences based on HMM-based classification of protein sequences to >93K pre-calculated phylogenetic trees in the PhyloFacts database. PhyloFacts is unique among phylogenomic databases in having both broad taxonomic coverage – more than 7.3M proteins from >99K unique taxa across the Tree of Life, including targeted coverage of genomes from Eukaryotes, Bacteria and Archaea — and integrating functional data on trees for Pfam domains and multi-domain architectures. PhyloFacts trees include functional and annotation data from UniProt (SwissProt and TrEMBL), GO, BioCyc, Pfam, Enzyme Commission and other sources. The FAT-CAT pipeline uses HMMs at all nodes in PhyloFacts trees to classify user sequences to different levels of functional hierarchies, based on the subtree HMM giving the sequence the strongest score. Phylogenetic placements within orthology groups defined on PhyloFacts trees are used to to predict function and to predict orthologs. Sequences from metagenome projects can be classified taxonomically based on the MRCA of the sequences descending from the top-scoring subtree node. Because of the broad taxonomic and functional coverage, FAT-CAT can identify orthologs and predict function for most sequence inputs. We’re working to make FAT-CAT less computationally intensive so that users will be able to upload entire genomes for analysis; in the interim, we limit users to 20 sequence inputs per day. Registered users are given a higher quota (see details online). We’d love to hear from you if you have feature requests or bug reports; please send any to Kimmen Sjölander – kimmen at berkeley dot edu (parse appropriately).
Interesting paper came up in my automated google searches for “phylogenomics”: Transitioning Toward a Universal Species Concept for the Classification of all Organisms | InTechOpen. It is by Jim Staley who has been writing a lot about microbial species concepts in the last few years. In addition to trying to bridge the gap between bacteria/archaea and eukaryotes in terms of species concepts. Not sure how I feel about everything in the paper but it has a really nice history of how species have been defined for bacteria. He breaks down this history into four periods
- Discovery of microorganisms,
- Advent of pure cultures and phenotypic features,
- Introduction of molecular analyses and
- Gene sequencing and genomics.
And goes through a bit of detail on each one. He also discusses what he sees as a need for a universal species concept and even makes some suggestions about how it might be implemented. Definitely worth a read.
Some related posts of mine and or links of potential interest:
All interested in microbes and their genomes should check out The Microbial Earth Project. It “is an international effort to generate a comprehensive catalog from genome sequences of all the archaeal and bacterial type strains. The name of the project comes from the recognition that Earth is a predominantly a microbial planet, and by effect in order to understand life on our planet, we need to understand how microbial life works.”
There are some 10,000 described type strains of bacteria and archaea. Not really a lot given that there are probably millions upon millions of species of bacteria and archaea. But it is what we have available to us in terms of the formally described and accepted species for which there is an available cultured strain.
At this site you can do things like “Adopt a Type Strain” or view a cool “Map of the type strains“.
The Steering Committee for the project is
Much of the real work being done by Nikos Kyrpides, George Garrity, and others though I am very pleased to be a member of the Steering Committee. One of my key jobs will be to get the word out early and often. Hence this post.
Barny Whitman asked me to post this announcement and, well, I am. I made one edit below (see strikethrough) in honor of Norm Pace.
Genomic Sequencing of
Prokaryotic Bacterial and Archaeal
The Community Sequencing Program (CSP) Quarterly Microbial call of the DOE Joint Genomes Institute provides a great opportunity to obtain draft genomic sequences of the type strains of bacterial and archaeal species. The type strains may also include proposed species prior to publication. Type strains must be relevant to DOE mission areas, such as bioenergy, biogeochemistry, bioremediation, carbon cycling, and phylogenetic diversity. However, strains of human pathogens and human associated species are not eligible. Proposals for genome sequencing of type strains can be submitted through the CSP Quarterly Microbial call, whose deadline is December 17, 2012, with approval usually being completed within one month. Up to 12 strains can be included in each proposal. Proposals for larger numbers of strains need to be submitted to the CSP annual call in the spring. If you cannot make the December call, Quarterly calls are also scheduled for March 25, June 17, and September 23, 2013.
Proposals may be completed on-line at: http://proposals.jgi-psf.org/proposals. You will need to register and sign in to this server. Once on the server, follow the links to the “CSP Quarterly Microbial/Metagenome”. All strains will have to have been deposited in a culture collection, including proposed type strains prior to publication. If a culture collection ID is not available, you can attach a copy of the Certification of Availability. Once approved, you will need to provide 5-10 µg of high molecular weight DNA.
For questions, contact Barny Whitman, University of Georgia (email@example.com).
Fun use of next generation sequencing in this paper: PLOS ONE: Next-Generation Sequencing Reveals Significant Bacterial Diversity of Botrytized Wine. They used sequencing to characterize the diversity of microbes associated with botrytized wine (wine produced from grapes infected with the mold Botrytis cinerea. They focused in particular on Dolce wine (not 100% sure what this is but I think it is wine from the Dolce winery …). And they focused in particular on the bacteria associated with this wine as it was being produced. Anyway … I am no food/drink microbiologist .. but this seems cool.