Tag Archives: Metagenomics Discussion Group

installing STAMP on a Mac

Since this was such a huge pain in the ass for so many of us, I figured I’d share what finally worked for me.

First this:
pip install STAMP

then I got an error about matplotlib, so then this:
pip install matplotlib

Now, I type STAMP and it launches.

Of course, I did a hundred other things before I tried this, any number of which may or may not have contributed to the ease of this solution. But, if you’re still trying to get STAMP installed, give this a try.

“pip install STAMP” was a suggestion by Tracy Teal, btw. Can’t wait for her and Titus to get to Davis!

Mendeley groups on environmental PCR, metagenomics, and microbial eukaryotes

As part of my NSF Research Coordination Network grant (RCN EukHiTS), I am currently managing a number of Mendeley groups that amalgamate relevant journal articles on different topics related to environmental PCR, metagenomics, and microbial eukaryotes. These groups are open (anyone can join with a Mendeley account), and I’m trying to keep them regularly updated with new articles (Mendeley members can also add articles, which I strongly encourage!):

  • Eukaryotic HTP Studies – Publications relevant to high-throughput environmental sequencing approaches focused on microbial eukaryotes. Articles will include any type of -Omic methods (marker gene amplicons, metagenomics, metatranscriptomics, etc.), eukaryote-focused tools/pipelines, and review/opinion pieces.
  • rRNA in Eukaryotes – Literature related to the ribosomal repeat array in eukaryotic genomes – variation in rRNA gene copy number, intragenomic polymorphisms, concerted evolution, transposable elements and their evolutionary and ecological implications.
  • Environmental PCRs – primer sets and bias – Literature related to primer set usage and bias across all taxonomic groups (bacteria, archaea, fungi and microbial eukaryotes) – includes primer sets and methods focused on 16S, 18S, ITS, other rRNA, COI, and other marker genes used for environmental sequencing.
  • eDNA in aquatic ecosystems – This group focuses on environmental DNA (eDNA) applications in aquatic ecosystems, include use of eDNA in bioassessment and environmental monitoring. Literature collection covers methods, analytical tools, and empirical studies (both basic and applied science).

Annotation Databases

MG-RAST allows you to view the annotation of your data using several different annotation pipelines/databases. So, we had a discussion about them. Each database/tool was tackled by a different person:

1. GenBank/RefSeq – Joe
2. SEED/Subsystems – Jenna
3. COG/NOG/eggNOG – Tyler
4. KEGG/KO – Megan
5. SwissProt/trEMBL – Kate
6. IMG – Guillaume
7. PATRIC – Sima
8. GO – David

I’m hoping that everyone will be so kind as to post a summary of their database here, as a reply to this blog post.

We kept coming back to the point that which database is right for you depends on what biological question you are hoping to address. As a test dataset, we are currently using samples of a microbial mat from Lake Frxell in Antarctica. Kate, Tyler, and Megan will provide us with a few interesting questions that we might be able to address using their data, and then we will all spend some time playing around with the annotation results from the different databases. How does the biological interpretation of the data change with respect to the annotation database used? Next week, we will discuss this.

Metagenomics Working Group – notes from 4/12/13

Guillaume ran us through the MG-RAST interface

Question that came up again (to be asked at the QIIME workshop) – what is the difference between PCA and PCoA

Canonical Correlation Analysis – a way to put vectors on your PCA to explain patterns in terms of metadata.

I suggested that we start putting up little tutorials about how to do different

Next week we will talk about annotation databases – please skim through some papers (and share them!) about the differences between COGS, KEGGS, etc.

Metagenomics Working Group

There are a number of people in and around the Eisen lab who are just starting to work with metagenomic (not 16S) data, and most (all?) of us have no idea what we’re doing! So, we’ve decided to meet weekly to walk through it together. The idea is to take an inventory of the currently available tools and pipelines for analyzing metagenomic data.

Here are Holly’s notes from the first meeting:

 

Spreadsheet of metagenome analysis – keep filling this in. Assign different people to test out different tools?

What type of questions are we going to be asking?

  • How do functional inferences compare across tools?

  • Are specific tools better suited for specific questions (bacteria/archaea vs. eukaryotes)?

Testing out metagenome pipelines for different types of user. 3 prospective levels:

  1. Analysis on a laptop or desktop (2-20 GB memory)

  2. Analysis on a server (20-50GB memory)

  3. Analysis on a cluster (>50GB memory)

Maybe restrict ourselves to testing out “popular” programs (look at citation counts?). Lots of programs out there.

What is the differences between databases? COG vs. KEGG vs. other ontologies

Samples we can use for benchmarking and testing

  • Mock community metagenomes

  • Published data – Yatsunenko et al. metagenome data

Guillaume notes that the Corn samples didn’t have enough sample point to do statistical analysis (~40 samples)