Metagenomics Working Group – notes from 4/12/13

Guillaume ran us through the MG-RAST interface

Question that came up again (to be asked at the QIIME workshop) – what is the difference between PCA and PCoA

Canonical Correlation Analysis – a way to put vectors on your PCA to explain patterns in terms of metadata.

I suggested that we start putting up little tutorials about how to do different

Next week we will talk about annotation databases – please skim through some papers (and share them!) about the differences between COGS, KEGGS, etc.

Metagenomics Working Group

There are a number of people in and around the Eisen lab who are just starting to work with metagenomic (not 16S) data, and most (all?) of us have no idea what we’re doing! So, we’ve decided to meet weekly to walk through it together. The idea is to take an inventory of the currently available tools and pipelines for analyzing metagenomic data.

Here are Holly’s notes from the first meeting:

 

Spreadsheet of metagenome analysis – keep filling this in. Assign different people to test out different tools?

What type of questions are we going to be asking?

  • How do functional inferences compare across tools?

  • Are specific tools better suited for specific questions (bacteria/archaea vs. eukaryotes)?

Testing out metagenome pipelines for different types of user. 3 prospective levels:

  1. Analysis on a laptop or desktop (2-20 GB memory)

  2. Analysis on a server (20-50GB memory)

  3. Analysis on a cluster (>50GB memory)

Maybe restrict ourselves to testing out “popular” programs (look at citation counts?). Lots of programs out there.

What is the differences between databases? COG vs. KEGG vs. other ontologies

Samples we can use for benchmarking and testing

  • Mock community metagenomes

  • Published data – Yatsunenko et al. metagenome data

Guillaume notes that the Corn samples didn’t have enough sample point to do statistical analysis (~40 samples)