Two Eisen lab papers selected for the PeerJ 2015 Collection

Cool.  Two paper from my lab were selected as highlights of 2014 papers in the Peer J: PeerJ Collection: PeerJ Picks 2015 Collection

The papers were

The microbes we eat: abundance and taxonomy of microbes consumed in a day’s worth of meals for three diet types

Jenna M. Lang,  Jonathan A. Eisen,  Angela M. Zivkovic

and 

Aaron E. Darling,  Guillaume Jospin,  Eric Lowe,  Frederick A. Matsen,  Holly M. Bik,  Jonathan A. Eisen

Thanks PeerJ and all the co-authors for their great work.  I love open science and I particularly think we need continuing experiments on the best ways to do open science.  Thus I like the experiment that is PeerJ in regard to how to publish and how to pay for open access fees.

Full disclosure: I am an Academic Editor at PeerJ.


A little bit about PhyloSift: phylogenetic analysis of genomes and metagenomes

New paper from people in the Eisen lab: PhyloSift: phylogenetic analysis of genomes and metagenomes [PeerJ].

Basically, the concept behind Phylosift is to provide for high quality, automated, high throughput phylogeny-driven analysis of metagenomic sequence data.  The software was developed openly on github and has been available in some form for more than a year.  Aaron, Holly, Erick and I have discussed it extensively in various talks around the world and thus we assume some are already familiar with it.

This project was coordinated by Aaron Darling, who was a Project Scientist in my lab and is now a Professor at the University of Technology Sydney.  Also involved were Holly Bik (post doc in the lab), Guillaume Jospin (Bioinformatics Engineer in the lab), Eric Lowe (was a UC Davis undergrad working in the lab) and Erick Matsen (from the FHCRC).

Abstract:

Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection.

In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata.

These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).

Figure 1 shows the general outline of the workflow.
Figure 1 showing the Phylosift workflow.

The workflow follows a series of steps including

  • Sequence identity search 
  • Alignment to reference multiple alignment 
  • Placement on a phylogenetic reference tree 
  • Visual presentation of taxonomic summary 
  • Comparison among samples (e.g., using Edge PCA)
In addition, there is a workflow for updating the database behind Phylosift which includes

  • Acquiring new genome data 
  • Gene family search and alignment workflow on each genome 
  • Phylogenetic inference and pruning 
  • Selection of representatives for similarity search 
  • Taxonomic reconciliation 

The paper shows some of the things you can do with Phylosift and some comparison of Phylosift and other methods.

Figure 2. Comparison of QIIME PCA and edge PCA analysis of human fecal samples.

Figure 3: Lineages contributing variation in human fecal sample community structure. (Analyzed using EDGE PCA)

It also provides Krona based output visualization of the taxonomic composition of a sample.

Anyway, more on Phylosift later.  Just thought I would get some out here on the blog.  Thanks to Aaron Darling, Holly Bik, Guillaume Jospin, Eric Lowe and Erick Matsen for all their hard work on this.  And thanks to the Department of Homeland Security for supporting the work.

For more about Phylosift see

New EisenLab paper: PhyloSift: phylogenetic analysis of genomes and metagenomes [PeerJ]

New paper from people in the Eisen lab (and some others): PhyloSift: phylogenetic analysis of genomes and metagenomes [PeerJ].  This project was coordinated by Aaron Darling, who was a Project Scientist in my lab and is now a Professor at the University of Technology Sydney.  Also involved were Holly Bik (post doc in the lab), Guillaume Jospin (Bioinformatics Engineer in the lab), Eric Lowe (was a UC Davis undergrad working in the lab) and Erick Matsen (from the FHCRC).  This work was supported by a grant from the Department of Homeland Security.

Abstract:

Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection.

In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata.

These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).

For more about Phylosift see