Bad microbiology reporting of the month award: C-Net on IBM "Sequencing the City" meeting

Well, I am still really annoyed by this unbearable article on C-Net yesterday: IBM sees big opportunity in sequencing microbes by Daniel Terdiman.  The article is about this “Sequencing the City” meeting organized by IBM that was on Tuesday and Wednesday.  I talked at the meeting on Tuesday (I could not go on Wednesday).  For more about my talk see: What to do when you realize the meeting you are speaking at is a YAMMM (yet another mostly male meeting)?.  But I am not criticizing the meeting here.  I am criticizing the article in C-Net which has many many flaws. For example consider:

According to James Kaufman, a research manager at the Almaden Research Center, the move to study metagenomics — the study of systems of micro-organisms — came from what he called a tipping point in big data. As more and more government-funded institutions study organisms and bacteria, they’ve collected more information about them, and submitted much of their work to centralized databases. “So there’s a growing library of genomes across the field of life,” Kaufman said. “That made possible metagenomics.”

What?  Metagenomics has been around for a long time.  Sure, many people in the field are taking advantage of so called big data, but there was no “tipping point” needed to launch the field.  This is just completely misguided.
And then even worse

The result: We can now look at and understand whole ecosystems at the bacterial level. One example of how that manifests is what IBM refers to as the Human Microbiome Project. According to an IBM document, that’s about characterizing “microbial communities found at multiple human body sites to discover correlations between changes in the microbiome with changes in human health.”

So – there have been dozens of high profile papers from the Human Microbiome Project.  There are hundreds of web pages with information about the project.  It was started years and years ago.  And the reporter quotes an “IBM document” to tell us what the Human Microbiome Project is?   And even worse the reporter says “what IBM refers to as the Human Microbiome Project” like they ran it / designed it.  Good that they refer to it as the Human Microbiome Project.  You know why?  Because that is what it is known as to all the other $(&@)(* people in the whole (%&# world.

The reporter goes on to write

This kind of work is not entirely new, but the scientists who will be gathering at IBM Research this week are grappling with one conundrum: they don’t know what they don’t know. So a big topic of conversation, and a big part of what IBM would like to see advanced, is “the ability to do metagenomics on the scale of a city or the world….That will depend on software services available in the cloud,” Kaufman said. “It has to be cheap, easy, and accessible from anywhere. That’s what we’re really good at.”

Once again making it seem like IBM is somehow leading this field.  Not to pick on IBM here.  I am glad they organized the meeting.  But either the reporter just got handed a press release from IBM and wrote it up, or did not do any type of background research, or both.  Sure IBM would like to see this.  But so would lots of other people.  Why make this all about IBM?  There are so many people who have done interesting work in the area of “microbiology of the built environment” – why are none of them even discussed?  What exactly is the point of this article if not to simply be a PR piece for IBM?  Aaaaaarg.

UPDATE 5/9 Storify of some of the Tweets about the meeting

New EisenLab paper: PhyloSift: phylogenetic analysis of genomes and metagenomes [PeerJ]

New paper from people in the Eisen lab (and some others): PhyloSift: phylogenetic analysis of genomes and metagenomes [PeerJ].  This project was coordinated by Aaron Darling, who was a Project Scientist in my lab and is now a Professor at the University of Technology Sydney.  Also involved were Holly Bik (post doc in the lab), Guillaume Jospin (Bioinformatics Engineer in the lab), Eric Lowe (was a UC Davis undergrad working in the lab) and Erick Matsen (from the FHCRC).  This work was supported by a grant from the Department of Homeland Security.

Abstract:

Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection.

In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata.

These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).

For more about Phylosift see

Mini journal club: staged phage attack of a humanizes microbiome of mouse

Doing another mini journal club here.  Just got notified of this paper through some automated Google Scholar searches: Gnotobiotic mouse model of phage–bacterial host dynamics in the human gut

Full citation: Reyes, A., Wu, M., McNulty, N. P., Rohwer, F. L., & Gordon, J. I. (2013). Gnotobiotic mouse model of phage–bacterial host dynamics in the human gut. Proceedings of the National Academy of Sciences, 201319470.

The paper seems pretty fascinating at first glance. Basically they built on the Jeff Gordon germ free mouse model and introduced a defined set of cultured microbes that came from humans.  And then they stages a phage attack on the system and monitored the response of the community to the phage attack.

Figure 1 from Reyes et al.

They (of course) also did a control – in this case with heat killed phage.  And they compared what happened to the live phage.  I love this concept as they are able to control the microbial community and then test dynamics of how specific phage affect that community inside a living host.  Very cool.

Who are the microbes in your neighborhood? Quite a few are from Melainabacteria – a new phylum sister to Cyanobacteria

I just love this paper … The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria | eLife from the labs of Ruth Ley and Jill Banfield (1st author is the co-first authors are Sara C. Di Rienzi and Itai Sharon).  It represents a landmark study in something that has intrigued many microbial diversity / human microbiome researchers for many years.  Early in the history of sequencing rRNA genes from human microbiome samples, researchers discovered something a bit weird – quite a few sequences were coming from what appeared to be close relatives of Cyanobacteria.  This was weird because all known Cyanobacteria were thought to be photosynthetic and – well – there is not too much light in the human gut.

Now – one possible explanation for this was that these sequences were coming from photosynthetic bacteria but these bacteria were not residents of the human gut but came via consumable items (i.e., food and drink).  Perhaps they were actually from chloroplasts of something in the diet (after all – chloroplasts are derived versions of cyanobacteria). This idea was discussed at many meetings I attended.  But there was no evidence for this.  Another possibility was that there was in fact some light in the human gut – leaking through from the outside or being produced from the inside. And perhaps this was enough to do a little photosynthesis.  Sound crazy?  Well, not so crazy after reports of photosynthesis in the deep sea.  A third possibility was that these sequences were coming from residents of the human gut that were related to (or even within) cyanobacteria but were not photosynthetic.  More detail on possible explanations are in this new paper and in some of the material cited therein.

Anyway – Ruth Ley has been discussing these unusual sequences for years and now in this paper her group and the group of Jill Banfield at Berkeley (along with some others) has used metagenomics and detailed assembly and phylogenetic analysis to reveal many new insights into these sequences.  I could write much more about this.  But, I think the paper really speaks for itself.  And it is open access so anyone and everyone can check it out.  And you should.  It is wonderful.

Fig 2 from Di Rienzi et al.

UPDATED 10/9/2013 to correct that there were co-first authors

Great use of metagenomic data: community wide adaptation signatures

OK I have been dreaming about doing something like this for many years.  One of the potentially most useful aspects of shotgun metagenomic data is that you get a sample of many/all members of a microbial community at once.  And then in theory one could look across different species and taxa and ask – do they all have similar adaptations in response to some sort of environmental pressure.  There have been a few papers on this over the last few years (e.g. check out this one from Muegge et al on Diet Driving Convergence in Gut Microbes).  But this new paper is really the type of thing I have been hoping to see: Environmental shaping of codon usage and functional adaptation across microbial communities.  Basically they looked at codon usage in organisms in different metagenomic samples and found major metagenome specific signatures, suggesting that different taxa were in essence converging on common codon usage.

The paper is definitely worth a look.

Guest post from Kimmen Sjölander about FAT-CAT phylogenomics pipeline

Below is a guest post from my friend and colleague Kimmen Sjölander, Prof. at UC Berkeley and phylogenomics guru. 


Announcing the FAT-CAT phylogenomic annotation webserver.

FAT-CAT is a new web server for phylogenomic prediction of function and ortholog identification and for taxonomic origin prediction of metagenome sequences based on HMM-based classification of protein sequences to >93K pre-calculated phylogenetic trees in the PhyloFacts database. PhyloFacts is unique among phylogenomic databases in having both broad taxonomic coverage – more than 7.3M proteins from >99K unique taxa across the Tree of Life, including targeted coverage of genomes from Eukaryotes, Bacteria and Archaea — and integrating functional data on trees for Pfam domains and multi-domain architectures. PhyloFacts trees include functional and annotation data from UniProt (SwissProt and TrEMBL), GO, BioCyc, Pfam, Enzyme Commission and other sources. The FAT-CAT pipeline uses HMMs at all nodes in PhyloFacts trees to classify user sequences to different levels of functional hierarchies, based on the subtree HMM giving the sequence the strongest score. Phylogenetic placements within orthology groups defined on PhyloFacts trees are used to to predict function and to predict orthologs. Sequences from metagenome projects can be classified taxonomically based on the MRCA of the sequences descending from the top-scoring subtree node. Because of the broad taxonomic and functional coverage, FAT-CAT can identify orthologs and predict function for most sequence inputs. We’re working to make FAT-CAT less computationally intensive so that users will be able to upload entire genomes for analysis; in the interim, we limit users to 20 sequence inputs per day. Registered users are given a higher quota (see details online). We’d love to hear from you if you have feature requests or bug reports; please send any to Kimmen Sjölander – kimmen at berkeley dot edu (parse appropriately). 

Cool new paper from DeLong lab: Pattern and synchrony of gene expression among sympatric marine microbial populations

Definitely worth looking at this paper if you are interested in uncultured microbes: Pattern and synchrony of gene expression among sympatric marine microbial populations.  From Ed Delong and team, it is published under the “Open” pathway in PNAS.

Also see press release here: Scientists track ocean microbe populations in their natural habitat to …

Interesting new #PLOS One paper on study design in rRNA surveys

Interesting new paper in PLoS One:  PLOS ONE: Taxonomic Classification of Bacterial 16S rRNA Genes Using Short Sequencing Reads: Evaluation of Effective Study Designs

Abstract: Massively parallel high throughput sequencing technologies allow us to interrogate the microbial composition of biological samples at unprecedented resolution. The typical approach is to perform high-throughout sequencing of 16S rRNA genes, which are then taxonomically classified based on similarity to known sequences in existing databases. Current technologies cause a predicament though, because although they enable deep coverage of samples, they are limited in the length of sequence they can produce. As a result, high-throughout studies of microbial communities often do not sequence the entire 16S rRNA gene. The challenge is to obtain reliable representation of bacterial communities through taxonomic classification of short 16S rRNA gene sequences. In this study we explored properties of different study designs and developed specific recommendations for effective use of short-read sequencing technologies for the purpose of interrogating bacterial communities, with a focus on classification using naïve Bayesian classifiers. To assess precision and coverage of each design, we used a collection of ~8,500 manually curated 16S rRNA gene sequences from cultured bacteria and a set of over one million bacterial 16S rRNA gene sequences retrieved from environmental samples, respectively. We also tested different configurations of taxonomic classification approaches using short read sequencing data, and provide recommendations for optimal choice of the relevant parameters. We conclude that with a judicious selection of the sequenced region and the corresponding choice of a suitable training set for taxonomic classification, it is possible to explore bacterial communities at great depth using current technologies, with only a minimal loss of taxonomic resolution.

Not sure I like everything in the paper.  For example, they focus on naive Bayesian classification methods … when (of course) I prefer phylogenetic methods.  But that is a small issue.  Overall there is a lot of useful detail in here about rRNA based taxonomic studies.  I note – some of this probably applies to metagenomic studies as well … perhaps this group will do a comparable analysis of metagenomics next?

Mizrahi-Man O, Davenport ER, Gilad Y (2013) Taxonomic Classification of Bacterial 16S rRNA Genes Using Short Sequencing Reads: Evaluation of Effective Study Designs. PLoS ONE 8(1): e53608. doi:10.1371/journal.pone.0053608

I note – if you want to catch up / learn / research metagenomics and phylogeny or classification check out the Mendeley group I started on the topic:

http://www.mendeley.com/groups/1152921/phylogenetic-and-related-analyses-of-metagenomic-data/widget/29/3/

Attention all metagenomicists: put your pinky in the corner of your mouth & say "1 million dollars"

Already posted this to Twitter and Facebook but had to post here too.  This is wild.  DTRA has announced a $1 million prize for metagenomic analysis: DTRA Algorithm Challenge | Landing Page.  From their page

The Prize:
As nth generation DNA sequencing technology moves out of the research lab and closer to the diagnostician’s desktop, the process bottleneck will quickly become information processing. The Defense Threat Reduction Agency (DTRA) and the Department of Defense are interested in averting this logjam by fostering the development of new diagnostic algorithms capable of processing sequence data rapidly in a realistic, moderate-to-low resource setting. With this goal in mind, DTRA is sponsoring an algorithm development challenge. 

The Challenge:
Given raw sequence read data from a complex diagnostic sample, what algorithm can most rapidly and accurately characterize the sample, with the least computational overhead?

My instinct is to keep this to myself because, well, I want to win.  But my sharing side of things won out and I am posting here.  Maybe we (i..e, the community) can develop an open, collaborative project to do this?  Just a thought …

People not Projects: the Moore Foundation continues to revolutionize marine microbiology w/ its Investigator program

People not Projects.

It is such a simple concept.  But it is so powerful.  I first became aware of this idea as it relates to funding scientific research in regard to the Howard Hughes Medical Institute’s Investigator program.  Their approach (along with a decent chunk of money) has helped revolutionize biomedical science.  And thus I was personally thrilled to see the introduction of this concept in the area of Marine Microbiology a few years back with the Gordon and Betty Moore Foundation’s “Marine Microbiology Initiative Investigator” program.  Launched in 2004 it helped revolutionize marine microbiology studies in the same way HHMI’s investigator program revolutionized biomedical studies.

The first GBMF MMI Investigator program ran from 2004 -2012. And the people supported were pretty darn special:

Now I am I suppose a little biased in this because at the same time GBMF launched this program they also put a bunch of money into the general area of Marine Microbiology and I have been the recipient of some of that money.  For example, I got a small amount of money as part of the GBMF Funded work at the J. Craig Venter Institute on the Sargasso Sea and Global Ocean Sampling metagenomic sequencing projects and also had a subcontract from UCSD/JCVI to do some work as part of the “CAMERA” metagenomic database project.  I ended up being a coauthor on a diverse collection of papers associated with these projects including Sargasso metagenome and this review, and GOS1GOS2 and my stalking the 4th domain paper.

I am also a bit biased in that I have worked with many of the people on the initial MMI Investigator list some before, some after the awards including papers with Jen Martiny, Ed Delong, Alex Worden and Ginger Armbrust, and Mary Ann Moran.

But perhaps most relevant in terms of possible bias towards the Gordon and Betty Moore Foundation is that in 2007 my lab received funds through the MMI program for a collaborative project with Jessica Green and Katie Pollard for our “iSEEM” project on “Integrating Statistical, Ecological and Evolutionary analyses of Metagenomic Data” (see http://iseem.org) which was one of the most successful collaborations in which I have ever been involved.  This project produced something like a dozen papers and many major new developments in analyses of metagenomic data including 16S copy correction, sifting families, microbeDB, PD of metagenomes, WATERs, BioTorrents, AMPHORA. and STAP.  This project just ended but Katie Pollard and I just got additional funds from GBMF to continue related work.

So sure – I am biased.  But the program is simply great.  In the eight years since the initial grants the Gordon and Betty Moore Foundation has helped revolutionize marine microbiology.  And a lot of this came from the Investigator program and it’s emphasis on people not projects.  I note – the Moore Foundation has clearly decided that this “people not projects” concept is a good one.  A few years ago they partnered with HHMI to launch a Plant Sciences Investigator Program  which I wrote about here.

It was thus with great excitement that I saw the call for applications for the second round of the MMI Investigator program.  I certainly pondered applying.  But for many reasons I decided not to.  And today the winners of this competition have been announced and, well, it is an very impressive crew:

Some of the same crowd as the previous round.  Some new people.  Some people not there from the previous round.  All of them are rock stars in their areas especially if one takes into account how senior they are (the more junior people are stars in development).  And all have done groundbreaking work in various areas relating to marine microbiology.  The organisms covered here run the gamut including viruses, bacteria, archaea, and microbial eukaryotes.  The areas of focus covered range from biogeochemistry to ecosystem modeling with everything in between.  It really is an impressive group. Delong pioneered metagenomics and helped launch studies of uncultured microbes in the oceans.  Karl has led the Hawaii Ocean Time series and done other brilliant work.  Sullivan and Rohwer and pushing the frontiers of viral studies in the oceans.  Allen, Armbrust, and Worden are among the leaders in genomic studies of microbial eukaryotes in the marine environment.   Dubilier, Bidle, Fuhrman and Follows Stocker (double listed Follows in original post …) – though they focus on very different aspects of marine microbes – are helping lead the charge in understanding interactions across the domains of life in the marine environment.  Orphan, Saito, Deutsch, Follows and Pearson are on the cutting edge of biogeochemical studies and trying to link experimental studies of microbes to biogeochemistry of oceans.

The great thing about the “people not projects” concept is that the people funded here get to follow their own path.  They are not going to be constrained by the complications and sometime idiocy of the grant review process.  They in essence get to do whatever they want.  Freedom to follow their noses.  Or their guts.  Or whatever.  It is a refreshing concept and as mentioned above has been revolutionary in various areas of science.  There has been a slow but steady spread of the “people not projects” concept to various federal agencies too but it seems to be more of a private foundation type of strategy.  Federal Agencies are so risk averse in funding that this type of concept does not work well there.  I wish there was more.  But I am at least thankful for what HHMI and GBMF and Wellcome and Sloan and other private groups are doing in this regard.  Now – sure – all of these private foundations do not do everything perfectly.  They have blunders here and there like everyone else.  But without a doubt I think we need more of the People not Projects concept.
Oh – and another good thing.  GBMF is quite a big supporter of Open Science in it’s various guises.  So one can expect much of the data, software, and papers from their funding to be widely and openly available.   
It is a grand time to be doing microbiology largely due to revolutions in technology and also to changes in the way we view microbes on the planet.  It is an even grander time to be doing marine microbiology due to the dedication of the Gordon and Betty Moore Foundation to this important topic.