There are a number of people in and around the Eisen lab who are just starting to work with metagenomic (not 16S) data, and most (all?) of us have no idea what we’re doing! So, we’ve decided to meet weekly to walk through it together. The idea is to take an inventory of the currently available tools and pipelines for analyzing metagenomic data.
Here are Holly’s notes from the first meeting:
Spreadsheet of metagenome analysis – keep filling this in. Assign different people to test out different tools?
What type of questions are we going to be asking?
-
How do functional inferences compare across tools?
-
Are specific tools better suited for specific questions (bacteria/archaea vs. eukaryotes)?
Testing out metagenome pipelines for different types of user. 3 prospective levels:
-
Analysis on a laptop or desktop (2-20 GB memory)
-
Analysis on a server (20-50GB memory)
-
Analysis on a cluster (>50GB memory)
Maybe restrict ourselves to testing out “popular” programs (look at citation counts?). Lots of programs out there.
What is the differences between databases? COG vs. KEGG vs. other ontologies
Samples we can use for benchmarking and testing
-
Mock community metagenomes
-
Published data – Yatsunenko et al. metagenome data
Guillaume notes that the Corn samples didn’t have enough sample point to do statistical analysis (~40 samples)
