Kimmen Sjolander talk, May 1st, 12pm, 1005 GBSF

Machine learning methods for protein function and structure prediction

Kimmen Sjölander
Associate Professor
Berkeley Phylogenomics Group
University of California, Berkeley

May 1, 2012, 12:00 p.m.
1005 GBSF Auditorium

Abstract: Theodosius Dobzhansky, the noted geneticist and evolutionary biologist, is famous for having said “Nothing makes sense except in the light of evolution.” In this talk, I will discuss the explicit use of evolution as a fundamental principle in bioinformatics, using machine learning methods in combination with information from protein structure and evolution to improve the power and specificity of a number of bioinformatics tasks, including prediction of protein structure and function, ortholog identification, functional site prediction, and simultaneous estimation of multiple sequence alignments and protein superfamily phylogenies. Because many of these methods require expertise and/or computational resources not available to most experimental biologists, we provide pre-calculated
phylogenetic trees for gene families in the PhyloFacts database. PhyloFacts 3.0 is a phylogenomic database of gene families across the Tree of Life. Each PhyloFacts family contains a multiple sequence alignment, phylogenetic tree, predicted orthologs, predicted pathway associations and experimental and other annotation data. As of April 26, 2012, PF 3.0 contains >7.3M protein sequences from >99K unique taxa (including strains) across >92K families.

Finally, I will describe our work on a fully automated system for high-throughput functional annotation of genomes and for taxonomic and functional annotation of metagenome (environmental sample) datasets. This system, which we call FAT-CAT (for Fast Approximate Tree Classification) uses hidden Markov models placed at internal nodes of PhyloFacts trees to classify sequences to different levels of functional hierarchies. Subtree nodes are annotated automatically using data available for sequences descending from those nodes, allowing both functional and taxonomic inference for sequences classified to those nodes. The PhyloFacts Phylogenomic Database is available at

Kimmen Flyer.pdf

Author: Jonathan Eisen

I am an evolutionary biologist and a Professor at U. C. Davis. (see my lab site here). My research focuses on the origin of novelty (how new processes and functions originate). To study this I focus on sequencing and analyzing genomes of organisms, especially microbes and using phylogenomic analysis

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: