Below is a guest post from Kevin Penn, who used to work in my lab …
I am a former Research Associate of Jonathan’s interested in understanding evolution and ecology of microbes in natural environments. Recently I’ve become interested in learning about the expression of secondary metabolite related genes in natural settings to put the gene’s products into an ecological context, because almost certainly microbes are not making natural products just to benefit humans. I am currently studying these topics as a post-doc in Janelle Thompson’s lab at MIT.
When I got to MIT there was a set of paired end Illumina HiSeq data from six time points collected over one day night cycle from the Kranji Reservoir in Singapore, which was experiencing a cyanobacterial Harmful Algal Bloom (cyanoHAB). Note algal in this case means bacterial, I used to argue that this is taxonomically incorrect but used colloquially I think it works. These samples are what the paper “Secondary metabolite gene expression and interplay of bacterial functions in a tropical freshwater cyanobacteria bloom” is based on. MIT has a program in Singapore called Center for Environmental Sensing and modeling/Singapore MIT Alliance (CENSAM/SMART) and one of the projects is to learn about microbial populations associated with the drainage and reservoirs over the city/state/country. The motivation for the study (Penn, et al 2014) is based on two observations. 1) The idea to sample a day night cycle of a harmful algal bloom derived from experiments done for marine Prochlorococcus showing major changes in gene expression in the evenings and morning and more similar profiles at noon and midnight (Zinser, et al 2009). 2) An initial sample collection and analysis for this study did not readily detect genes for the toxin microcystin from drainages around the reservoir catchment (Nshimyimana, et al 2014) indicating the Cyanobacterium was growing in the reservoir (i.e. not being flushed in). We knew the bloom in the Reservoir was dominated by Microcystis aeruginosa but now we wanted to learn if microcystin toxin genes were expressed in the reservoir and if so were they expressed around the clock.
Harmful algal blooms are of concern because they appear to be increasing in frequency on a global scale. HABs are not only eyesores they also produce toxins that make lakes unusable for drinking water and recreation. For a good introduction to HABs I suggest reading an excellent book “The algal bowl: overfertilization of the world’s freshwaters and estuaries” by David W. Schindler & John R. Vallentyne. But I should note there are probably thousands of books written on the subject. Below you can see what our study site looked like during a bloom with a surface scum visible and during conditions where the water is a bit more clear (post bloom).
Polyketide synthases (PKS) and Non-ribsomal peptide synthetases (NRPS)
The search for expression of microcystin toxin genes is also a part of my larger interest to learn about the expression of PKS and NRPS genes in natural settings. PKS and NRPS derived molecules represent a large class of natural products famous for being toxins and used as medicine to treat human disease. Two phyla of bacteria are historically known for their production of these compounds (Actinobacteria and Cyanobacteria). For example the PKS and NRPS derived microcystin toxin is produced by M. aeruginosa and members of the Phylum Actinobacteria produce the potent antibiotic rifamycin. The expression and presence of most PKS and NRPS pathways in natural settings is currently not very well understood.
Prior to this work it was not clear that bacterial PKS and NRPS pathways are expressed in natural settings. The products of the microcystin pathways are present in harmful algal blooms (thus the term Harmful). This made Kranji Reservoir a good system to study because we should observe the transcripts for microcystin. PKS and NRPS genes can be highly repetitive and similar between different pathways so we were not sure we find them with Illumina type sequencing. Based on my initial tests using a tool called NaPDoS, which I helped developed at Scripps to quickly identify sequence tags from PKS and NRPS gene pathways, it was clear we could see the expression of many different pathways in our data. This spurred me on to look at the differences in expression over time. The examination of the time series revealed that there appears to be a rhythm to expression of PKS and NRPS genes and that strikingly, one of the most highly expressed PKS/NRPS gene cluster in M. aeruginosa has not been linked to a molecule. This is especially interestingly from an ecological perspective, as one of the most highly expressed PKS/NRPS pathways have yet to be associated with a product.
One of the cool things about science is that it can be predictive. Within an experiment of photosynthetic bacteria then you would hope that your expression data reflects the idea that photosynthetic life uses light to photosynthesize and that the genes that code for the machines that harness light would be most highly expressed during the day. We call that, the the “sanity check,” and it came out very nicely in our metatranscript data; showing that photosynthesis related genes cycle in the environment and are highly correlated with the day night cycle. Our observation that the things we expected to be highly expressed were highly expressed gave us confidence that our data may have other patterns that we would not necessarily think to look for. We started to look at broader categories of function genes for the top four phyla. From this analysis we noticed that some phyla were enriched for particular genes relative to other phyla, which in turn allowed us to make some ecological predictions in relation to how each group, might be functioning in the bloom community. For example look at figure 4and 5 in the paper and you can see that Actinobacteria are mostly transporting photosynthetically derived carbohydrates but Bacteroidetes groups are mostly transporting peptides furthermore groups within the proteobacteria are expressing most of the motility and chemotaxis related genes.
Quantifying natural microbial communities remains a significant challenge and more importantly identifying ecological functions for phenotypes promises to provide microbial evolutionary biologist with crucial data to learn about the evolution of bacteria. Imagine trying to study the evolution of a hand if you had no idea of the ecological function for the hand.
Problem Solving- paired end reads
One of the important decisions we had to make for us to start the analysis of Illumina data center on the state of paired end sequencing in metatranscriptomics. Paired end sequencing is a great boon for Illumina sequencing and Illumina sequencing created a huge opportunity for the field of metagenomics. But paired Illumina reads that do not collapse into one can represent a large portion of an Illumina sequence run despite efforts to create short enough sequences to have overlap and yet make the fragments large enough to make paired end sequences more informative. Paired ends can complicate issues because they may represent two genes but one operon, or two genes from different operons which is a problem for analysis trying to assign function to reads. The other issue is that in assigning taxonomy to reads by chance alone similar sequences although part of a pair may match different organisms. MEGAN tries to deal with this by increasing bit scores for sequences that match the same thing. We made the decision to use paired information to improve the confidence in function assignment in MEGAN if both reads hit the same gene, and treated 1 and 2 reads as separate for counting total reads matching a gene if the read counts were not to be normalized to gene length. Another aspect of the study focused on calculating expression for genes from the bloom former M. aeruginosa using RPKM which does take into account gene length thus we decided to treat the 1 and 2 reads as technical replicates for calculating RPKMs and averaged the values.
This experiment has given us the first glimpse at expression of toxin genes in a natural setting and provided us with some clues of microbial phylum level interplay. The next experiment to further test our observations includes a greater sampling effort over two day night cycles at a greater frequency and with replicates and sampling at the surface and subsurface. This work is being done in collaboration with another research group interested in Microcystisand harmful algal blooms at the National University of Singapore led by Prof. Karina Gin. It is known that M. aeruginosa strains migrate up and down in the water column and we want to check to see if some of our cyclic observations relate to the presence of different strains present on the surface throughout the day. A follow up study in progress is to look at the reservoir community during non-bloom conditions and run perturbations to identify the effects of the addition of nitrate, phosphate, and microcystin on the microbial community in hopes to learn if there are expression patterns that show how Microcystisis able to bloom.
The exact story behind the paper will be better understood if it is supplemented with a brief background about my introduction to Genomics and microbial ecology which mainly occurred after starting work as a Research Associate for Jonathan. Looking back “many years ago” I had just finished up an undergrad degree at UCSB in Aquatic biology and I was looking for a job as a scientist when I met Jonathan. It was really my first meetings with Jonathan that have set my way forward in research. I wanted to learn about how things evolve and the ecological functions of traits and Jonathan wanted to understand how all life evolved which meant he was studying the genomes of microbes. In our first meeting we discussed how genomics and methods associated with genomics namely 16S rRNA gene community studies were going to allow us to learn all about microbial ecosystems and even allow us to do insitu ecological studies of microbes (the term metagenomics was not widely known or used at this time). As TIGR slowly evolved into JCVI, I began my move to grad school to work in Paul Jensen’s lab at Scripps Institute of Oceanography who had recently sequenced the genome of a couple species of marine actinomycetes. In grad school I spent a lot of time learning about natural products and the genomes of famous group of organism called Actinomycetes, which make about 80% of the antibiotics we take today. By the time I finished grad school I had become acutely interested in learning about the expression of natural products related genes in a natural setting.
Our latest paper published in ISME reflects a combination of my exposure to some very different fields of scientific research, from studying genomics and community diversity at The Institute for Genomic Research (TIGR) to my PhD work in natural products research at Scripps and now my studies on community gene expression dynamics in Harmful Algal blooms at MIT. I have been researching the ideas about insitu microbial ecology that Jonathan discussed with me those many years ago and continue to expand our knowledge about what microbes are doing in natural setting in this paper.
Of course I did not do this paper in a vacuum at MIT. Prof. Janelle Thompson organized the data collection, co-wrote the paper and taught me a lot about the appropriate statistics we needed to use to analyze our data and interpret the results. Graduate students Tim Helbig and Sonia Timberlake helped me get going on the computer clusters here at MIT. One of my favorite parts of moving institutions is learning the in and outs of new computer clusters. I have been funded as a postdoctoral associate at MIT and subsequently by the NSF post-doctoral fellowship intersection of math and biology during this research. Singapore CENSAM/SMART has supported our travels to Singapore along with sequencing costs.