Annotation Databases

MG-RAST allows you to view the annotation of your data using several different annotation pipelines/databases. So, we had a discussion about them. Each database/tool was tackled by a different person:

1. GenBank/RefSeq – Joe
2. SEED/Subsystems – Jenna
3. COG/NOG/eggNOG – Tyler
4. KEGG/KO – Megan
5. SwissProt/trEMBL – Kate
6. IMG – Guillaume
7. PATRIC – Sima
8. GO – David

I’m hoping that everyone will be so kind as to post a summary of their database here, as a reply to this blog post.

We kept coming back to the point that which database is right for you depends on what biological question you are hoping to address. As a test dataset, we are currently using samples of a microbial mat from Lake Frxell in Antarctica. Kate, Tyler, and Megan will provide us with a few interesting questions that we might be able to address using their data, and then we will all spend some time playing around with the annotation results from the different databases. How does the biological interpretation of the data change with respect to the annotation database used? Next week, we will discuss this.

Unknown's avatar

Author: jennomics

I am a Postdoc in Jonathan Eisen's Lab at UC Davis. jennomics@gmail.com

2 thoughts on “Annotation Databases”

  1. Back in the day, every time a new genome was sequenced, a group of researchers who worked with that organism on a daily basis, would fly in from all over the world, sequester themselves in a conference room, and work together to annotate the genome. At the JGI, where I worked, these were called “Jamborees.” The SEED folks took a different approach to annotation. Instead of convening experts on an organism, and annotating a genome on a gene by gene basis, they gathered experts on “Subsystems.” Subsystems are loosely defined, and can range from things like “TCA cycle” to “membrane transport.” Each subsystem has its own expert, and as a new genome sequence becomes available, the experts pounce on it, looking to see if it has some or all of the components of their favorite subsystem.

    Obviously, this approach results in an annotation database that has very complete, accurate, manually curated entries for all of the subsystem components, and genomes/metagenomes that may be (relative to other databases like KEGG) perhaps somewhat under-annotated.

    Like

  2. PATRIC: Pathosystems Resource Integration Center (PATRIC)

    Designed to assist scientists in infectious-disease research with:
    1. A comprehensive bacterial genomics database
    2. Associated data relevant to genomic analysis
    3. Computational tools and platforms for bioinformatics analysis

    Major features available at PATRIC, dividing the resources into two major categories:
    1. Organisms, genomes, and comparative genomics
    2. Recurrent integration of community-derived associated data.

    What can we do using PATRIC:
    Protein Family Sorter: Compares protein families across closely related or diverse groups of genomes, visualizes them using interactive heatmaps.
    Phylogeny viewer: Allows exploration of phylogenetic relationships using species- and genus-level coloring schemes.
    Genome metadata: Supports searching for and locating genomes of interest based on various combinations of 61 different metadata fields.
    ID Mapping: Quickly maps PATRIC identifiers to those from other prominent external databases, such as GenBank, RefSeq, UniProt, etc.
    Comparative Pathway Tool: Supports comparison of consistently annotated metabolic pathways across closely related or diverse groups of genomes and visualizes them using interactive KEGG maps and heatmaps.

    Main Reference article: http://www.ncbi.nlm.nih.gov/pubmed/21896772
    Web site: http://patricbrc.org/portal/portal/patric/Home

    Future additions to PATRIC:
    PATRIC researchers will be able to analyze and compare their own data against available data for all bacterial genomes.

    Like

Leave a reply to jennomics Cancel reply