Guest post from Kimmen Sjölander about FAT-CAT phylogenomics pipeline

Below is a guest post from my friend and colleague Kimmen Sjölander, Prof. at UC Berkeley and phylogenomics guru. 


Announcing the FAT-CAT phylogenomic annotation webserver.

FAT-CAT is a new web server for phylogenomic prediction of function and ortholog identification and for taxonomic origin prediction of metagenome sequences based on HMM-based classification of protein sequences to >93K pre-calculated phylogenetic trees in the PhyloFacts database. PhyloFacts is unique among phylogenomic databases in having both broad taxonomic coverage – more than 7.3M proteins from >99K unique taxa across the Tree of Life, including targeted coverage of genomes from Eukaryotes, Bacteria and Archaea — and integrating functional data on trees for Pfam domains and multi-domain architectures. PhyloFacts trees include functional and annotation data from UniProt (SwissProt and TrEMBL), GO, BioCyc, Pfam, Enzyme Commission and other sources. The FAT-CAT pipeline uses HMMs at all nodes in PhyloFacts trees to classify user sequences to different levels of functional hierarchies, based on the subtree HMM giving the sequence the strongest score. Phylogenetic placements within orthology groups defined on PhyloFacts trees are used to to predict function and to predict orthologs. Sequences from metagenome projects can be classified taxonomically based on the MRCA of the sequences descending from the top-scoring subtree node. Because of the broad taxonomic and functional coverage, FAT-CAT can identify orthologs and predict function for most sequence inputs. We’re working to make FAT-CAT less computationally intensive so that users will be able to upload entire genomes for analysis; in the interim, we limit users to 20 sequence inputs per day. Registered users are given a higher quota (see details online). We’d love to hear from you if you have feature requests or bug reports; please send any to Kimmen Sjölander – kimmen at berkeley dot edu (parse appropriately). 

Worth a read: Jim Staley on a "Universal Species Concept" and the history of microbial species concepts

Interesting paper came up in my automated google searches for “phylogenomics”: Transitioning Toward a Universal Species Concept for the Classification of all Organisms | InTechOpen.  It is by Jim Staley who has been writing a lot about microbial species concepts in the last few years.  In addition to trying to bridge the gap between bacteria/archaea and eukaryotes in terms of species concepts.  Not sure how I feel about everything in the paper but it has a really nice history of how species have been defined for bacteria. He breaks down this history into four periods

  • Discovery of microorganisms,
  • Advent of pure cultures and phenotypic features,
  • Introduction of molecular analyses and
  • Gene sequencing and genomics.
And goes through a bit of detail on each one.  He also discusses what he sees as a need for a universal species concept and even makes some suggestions about how it might be implemented.  Definitely worth a read.  
Some related posts of mine and or links of potential interest:

Pipet or Pipette?! And some updates on our project :)

First of all, how do you spell the thing?? I’ve seen it both ways.. pipet and pipette. I’m thinking it’s pipette because it doesn’t have the red squiggly spelling error sign under it. Just wondering!

 

Now for the good stuff! We’ve been rocking and rolling in the lab. (Except for the week I came down with the flu) We have so many samples and it would be a bummer to do EVERYTHING (DNA extractions, PCR, Gel confirmation, etc) for every single sample and have disappointing results. While we are optimistic about our results, we are currently preparing samples from the first coral ponds at different time points for sequencing just so we can see how our results look in terms of succession of the microbial community in the pond. If we get cool results (crossing our fingers) we’ll go back to the pipettes and prepare the rest of the samples!

 

We’ve got a lot of work to do but we are making progress. We are presenting to the lab next Friday about our project. I am really excited to get feedback from experts who know a lot more than me!

Advice on asking for letters of recommendation (updated May 2013)

This is based off an e-mail I sent recently to a student and someone suggested I post it here:

Asking for letters of recommendation

In general I, and others are happy to write letters of recommendation for people… it’s part of our jobs after all.  However, there are some tips I would offer anyone soliciting letters at any stage of their career.

1)  Don’t ask me for a letter only a few days before it’s due.  This seems like such a simple concept but one that is violated so often.

2)  If you ask me for a letter, you need to send a copy of your CV.  No matter how long I’ve worked with you, there’s probably still information in there I didn’t know and this helps me write a letter that doesn’t sound like a form letter.

3) Send me a description of the program you’re applying for and why.  Again, this helps me write a better letter and doesn’t force me to have to trawl the internet for information

4) Make it as easy as possible for me to write the letter!  This is especially critical with professors.  If the letter needs to be mailed, you should hand me a stamped, already addressed envelope so all I have to do is drop in a letter and throw it in the outgoing mail.  If it’s an electronic form provide me with detailed instructions and links.

5) Don’t attempt to bribe me.  I’m not kidding… for example once I got a handwritten request for a letter of recommendation along with $50.  This is not a good idea!

(Updated with two more in May 2013)

6) Before asking a post-doc or a project scientist for a letter of recommendation make sure that you don’t actually need one from the professor.

7) Don’t list me as a reference for anything without at least asking first.

The gurus of evolution predict the future #PLOSBiology

Nice commentary / viewpoint piece in PLOS Biology last months: PLOS Biology: Evolutionary Biology for the 21st Century

Citation.Jonathan Losos, Stevan J. Arnold, Gill Bejerano, E. D. Brodie III, David Hibbett, Hopi E. Hoekstra, David P. Mindell, Antónia Monteiro, Craig Moritz, H. Allen Orr, Dmitri A. Petrov, Susanne S. Renner, Robert E. Ricklefs, Pamela S. Soltis, Thomas L. Turner (2013) Evolutionary Biology for the 21st Century. PLoS Biol 11(1): e1001466. doi:10.1371/journal.pbio.1001466

They discuss issues like Biodiversity Informatics (see Figure to the left) and evolutionary applications like evolutionary medicine, food production, sustaining biodiversity, computational algorithms, and justice.  They also discuss issues like the oncoming onslaught of specimens and the need to link up with museums who have expertise in dealing with such issues.  Anyway – it is worth a look.  Not the most visionary of pieces ever but it has some concrete suggestions and predictions that will be of use.

Worth a look (from arXiv): Robust estimation of microbial diversity in theory and in practice

I confess I do not have the time right now to delve into this in detail but this seems of interest: Robust estimation of microbial diversity in theory and in practice. From Bart Haegeman, Jerome Hamelin, John Moriarty, Peter Neal, Jonathan Dushoff and Joshua S. Weitz (full disclosure -I am friends and co-author with some of the authors here).

Abstract: Quantifying diversity is of central importance for the study of struc- ture, function and evolution of microbial communities. The estima- tion of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably es- timate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in compar- ing species richness estimates by applying Chao’s estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics (“Hill diversities”), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao’s estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.

It is in press in ISME and freely available at arXiv.

Psyched: have rescued old MobileMe and other websites after Apple annoyingly cancelled them by posting to Dropbox

A few years ago I used to post many things for the Web through Apple’s Mobile Me service.  Annoyingly, Apple ended up treating this like they treat connectors and plugs for their phones and Macs.  They just decided to move their online system to iCloud and deleted all the old websites through Mobile Me.  Which left me in a lurch.  And then I forgot about it.  But I have been rediscovering how annoying this is since I had a lot of information out there on old papers and projects and now it is gone from the interwebs.  So I have ben trying to re-share all of this stuff.

One way has ben to post data from old papers to Figshare.  See for example:

But I also had all sorts of website related material that is annoyingly gone.  And yesterday I discovered at least a simple solution to this.  I can put all my old websites in my Dropbox public folder and share the link to those files with others and they work pretty well.

See for example my re-releasing of some of my April 1 and other joke websites:

 Also – I have reposted some of the my old websites

I have always been into sharing scientific information on the web since, well, the web came out.  And I am going to dig around for other old websites to post them via Dropbox.  If anyone knows an easy way to upload / convert an old website into WordPress, I suppose I could load in all the old pages into my current wordpress site, but this was a much easier temporary solution.  Still annoyed with Apple but glad Dropbox allows a simple solution.

Soliciting opinions on paper "A congruent phylogenomic signal places eukaryotes within the Archaea"

Been reading this paper which I posted about to Twitter recently: A congruent phylogenomic signal places eukaryotes within the Archaea.  It is very interesting.  Not sure what to make of it though.  So – in contrast to my normal ways of putting my ideas out there first and asking for / hoping for comment I thought – let’s mix things up.  So – I am soliciting comments from people BEFORE I write down my comments.  Any ideas / thoughts / comments would be welcome.

Full citation:

Proc Biol Sci. 2012 Dec 22;279(1749):4870-9. doi: 10.1098/rspb.2012.1795. Epub 2012 Oct 24.
A congruent phylogenomic signal places eukaryotes within the Archaea.
Williams TA, Foster PG, Nye TM, Cox CJ, Embley TM.

My new microbial art for my office: salt evaporation ponds and goethermal spring stamps

Thanks to Russell Neches in my lab I found out about the Earthscapes series stamps from the US Postal Service.  Two of the stamps feature microbial ecosystems and I ordered framed, enlarged versions of the photos for my office.

They are available at the links below:

Go microbes.

PLoSOne paper: Parallel polymorphisms for pepper population phylogenetics, from #UCDavis, not #Pepperspray

Interesting new paper from colleagues at UC Davis: PLOS ONE: Characterization of Capsicum annuum Genetic Diversity and Population Structure Based on Parallel Polymorphism Discovery with a 30K Unigene Pepper GeneChip.

Press release is here: http://www.news.ucdavis.edu/search/news_detail.lasso?id=10497

Good to see something on peppers from UC Davis not about spraying.