Guest post from Kimmen Sjölander about FAT-CAT phylogenomics pipeline

Below is a guest post from my friend and colleague Kimmen Sjölander, Prof. at UC Berkeley and phylogenomics guru. 


Announcing the FAT-CAT phylogenomic annotation webserver.

FAT-CAT is a new web server for phylogenomic prediction of function and ortholog identification and for taxonomic origin prediction of metagenome sequences based on HMM-based classification of protein sequences to >93K pre-calculated phylogenetic trees in the PhyloFacts database. PhyloFacts is unique among phylogenomic databases in having both broad taxonomic coverage – more than 7.3M proteins from >99K unique taxa across the Tree of Life, including targeted coverage of genomes from Eukaryotes, Bacteria and Archaea — and integrating functional data on trees for Pfam domains and multi-domain architectures. PhyloFacts trees include functional and annotation data from UniProt (SwissProt and TrEMBL), GO, BioCyc, Pfam, Enzyme Commission and other sources. The FAT-CAT pipeline uses HMMs at all nodes in PhyloFacts trees to classify user sequences to different levels of functional hierarchies, based on the subtree HMM giving the sequence the strongest score. Phylogenetic placements within orthology groups defined on PhyloFacts trees are used to to predict function and to predict orthologs. Sequences from metagenome projects can be classified taxonomically based on the MRCA of the sequences descending from the top-scoring subtree node. Because of the broad taxonomic and functional coverage, FAT-CAT can identify orthologs and predict function for most sequence inputs. We’re working to make FAT-CAT less computationally intensive so that users will be able to upload entire genomes for analysis; in the interim, we limit users to 20 sequence inputs per day. Registered users are given a higher quota (see details online). We’d love to hear from you if you have feature requests or bug reports; please send any to Kimmen Sjölander – kimmen at berkeley dot edu (parse appropriately). 

The gurus of evolution predict the future #PLOSBiology

Nice commentary / viewpoint piece in PLOS Biology last months: PLOS Biology: Evolutionary Biology for the 21st Century

Citation.Jonathan Losos, Stevan J. Arnold, Gill Bejerano, E. D. Brodie III, David Hibbett, Hopi E. Hoekstra, David P. Mindell, Antónia Monteiro, Craig Moritz, H. Allen Orr, Dmitri A. Petrov, Susanne S. Renner, Robert E. Ricklefs, Pamela S. Soltis, Thomas L. Turner (2013) Evolutionary Biology for the 21st Century. PLoS Biol 11(1): e1001466. doi:10.1371/journal.pbio.1001466

They discuss issues like Biodiversity Informatics (see Figure to the left) and evolutionary applications like evolutionary medicine, food production, sustaining biodiversity, computational algorithms, and justice.  They also discuss issues like the oncoming onslaught of specimens and the need to link up with museums who have expertise in dealing with such issues.  Anyway – it is worth a look.  Not the most visionary of pieces ever but it has some concrete suggestions and predictions that will be of use.

Psyched: have rescued old MobileMe and other websites after Apple annoyingly cancelled them by posting to Dropbox

A few years ago I used to post many things for the Web through Apple’s Mobile Me service.  Annoyingly, Apple ended up treating this like they treat connectors and plugs for their phones and Macs.  They just decided to move their online system to iCloud and deleted all the old websites through Mobile Me.  Which left me in a lurch.  And then I forgot about it.  But I have been rediscovering how annoying this is since I had a lot of information out there on old papers and projects and now it is gone from the interwebs.  So I have ben trying to re-share all of this stuff.

One way has ben to post data from old papers to Figshare.  See for example:

But I also had all sorts of website related material that is annoyingly gone.  And yesterday I discovered at least a simple solution to this.  I can put all my old websites in my Dropbox public folder and share the link to those files with others and they work pretty well.

See for example my re-releasing of some of my April 1 and other joke websites:

 Also – I have reposted some of the my old websites

I have always been into sharing scientific information on the web since, well, the web came out.  And I am going to dig around for other old websites to post them via Dropbox.  If anyone knows an easy way to upload / convert an old website into WordPress, I suppose I could load in all the old pages into my current wordpress site, but this was a much easier temporary solution.  Still annoyed with Apple but glad Dropbox allows a simple solution.

Nice timing: Our paper on the Darwin’s Finch genome is out today on Darwin’s birthday

Birthday party for Darwin in 2009

Well, I assume this was on purpose from the folks at Biomed Central but not sure.  Our paper on the genome of one of Darwin’s Finches is out today in BMC Genomics: BMC Genomics | Abstract | Insights into the evolution of Darwin’s finches from comparative analysis of the Geospiza magnirostris genome sequence.

Abstract of the paper:

Background
A classical example of repeated speciation coupled with ecological diversification is the evolution of 14 closely related species of Darwin’s (Galápagos) finches (Thraupidae, Passeriformes). Their adaptive radiation in the Galápagos archipelago took place in the last 2–3 million years and some of the molecular mechanisms that led to their diversification are now being elucidated. Here we report evolutionary analyses of genome of the large ground finch, Geospiza magnirostris.
Results
13,291 protein-coding genes were predicted from a 991.0 Mb G. magnirostris genome assembly. We then defined gene orthology relationships and constructed whole genome alignments between the G. magnirostris and other vertebrate genomes. We estimate that 15% of genomic sequence is functionally constrained between G. magnirostris and zebra finch. Genic evolutionary rate comparisons indicate that similar selective pressures acted along the G. magnirostris and zebra finch lineages suggesting that historical effective population size values have been similar in both lineages. 21 otherwise highly conserved genes were identified that each show evidence for positive selection on amino acid changes in the Darwin’s finch lineage. Two of these genes (Igf2r and Pou1f1) have been implicated in beak morphology changes in Darwin’s finches. Five of 47 genes showing evidence of positive selection in early passerine evolution have cilia related functions, and may be examples of adaptively evolving reproductive proteins.
Conclusions
These results provide insights into past evolutionary processes that have shaped G. magnirostris genes and its genome, and provide the necessary foundation upon which to build population genomics resources that will shed light on more contemporaneous adaptive and non-adaptive processes that have contributed to the evolution of the Darwin’s finches.

Figure 1

There is a long long long story behind this paper.  Too long for me to write up right now.  I wrote up some of the story for a Figshare posting of the genome data last year.

“Darwin’s Finches” are a model system for the study of various aspects of evolution and development.  In 2008 we commenced on a project to sequence the genomes of some of these species – inspired by the (then) upcoming celebration of the 200th anniversary of the birth of Charles Darwin (which was in February 2009).  The project started with a brief discussion at the AGBT meeting in 2008 and then via an email conversation between Jonathan Eisen and Jason Affourtit about the possibility of a collaboration involving the 454 company (which was looking for projects to highlight the power of it’s then relatively new 454 sequencing machines).  After further discussions between Jonathan Eisen, his brother Michael Eisen (who separately had become interested in Darwin’s finches) and people from 454 it was decided that this was a potentially good project for a scientific and marketing collaboration.  

In these conversations it was determined that the most likely limiting factor would be access to DNA from the finches.  This was largely an issue due to the fact that the Galapagos Islands (where the finches reside) are a National Park in Ecuador and also a World Heritage site.  Collection of samples there for any type of research is highly regulated.  Thus, Jonathan Eisen made contact with Peter and Rosemary Grant – the most prominent researchers working on the finches – and who Eisen had discussed sequencing the finch genomes in the early 2000s.  In that previous conversation it was determined that the sequencing would be too expensive to carry out without a major fundraising effort.  However, with the advent of “next generation” sequencing methods such as 454 the total costs of such a project would be much lower.   

In the conversations with the Grants, the Grants offered to ask around to see if anyone had sufficient amounts of DNA (or access to samples), which would be needed for genome library construction.  Subsequently they identified Arkhat Abzhanov from Harvard as someone who likely had samples as well as permission to do DNA-based work on them, from many of the finch species. Abzhanov offered to provide samples from three key species (large ground finch Geospiza magnirostris, large cactus finch G. conirostris and sharp-billed finch G. difficilis) and DNA was sent to Roche-454 for sequencing in July of 2008.  In August, the first “test” sequence data was provided from Geospiza magnirostris.  A plan was then made to generate additional data and Roche offered to do the sequencing at their center at a steep discount.  Funds were raised by Jonathan Eisen, Greg Wray, Monica Riley, and others to pay for the sequencing and over the next year or so, three sequencing bursts were conducted at Roche-454. “

That is a decent summary of the background.  The details on the science are in the paper.  What the background does not say is that the project languished for years as we did not have funds to support the actual analysis of the genomes and it was kind of out of my normal area of expertise.  Along the way, I did a poor job of communicating with some of the initial parties in the project (e.g., I did a really bad job of communicating with Greg Wray – who had provide some of the funds – and I will forever be trying to make things up to him).  Anyway, thankfully Arhat eventually pulled together a group of people led by Chris Ponting to help analyze the genome and Chris led the way to the paper that is out today.  Only four years after our original goal.

I have been a birder and an evolutionary biologist for many many many years. Thus this is kind of a cool project for me.  When I was in the Galapagos in 2002 I dreamed of doing a project like this – and even started doodling Darwin’s finches all over the place – including on some of the styrofoam cups we sent down to the bottom of the ocean on the outside of the Alvin sub as part of a deep sea research cruise I went on.  See below:

https://picasaweb.google.com/s/c/bin/slideshow.swf

Add caption

Some related posts:
From 2002

From 2002

Me, in the Galapagos in 2002

Me in the Galapagos in 2002

Rhodopsins Rhodopsins everywhere …

Was browsing through this paper (largely due to my interest in sequencing genomes of novel organisms): Genome Biology | Abstract | Genome of Acanthamoeba castellanii highlights extensive lateral gene transfer and early evolution of tyrosine kinase signaling.

And I found they found something very interesting.  “We identified two rhodopsins both with C-terminal histidine kinase and response regulator domains with homology to the sensory rhodopsins of the green algae that represent candidates for light sensors in Ac (Figure 3).” Seems they found some homologs of the proteorhodopsin / halorhodopsin family of proteins which I have been interested in for years.  Check out Figure 3:

Every couple of months there is a new group of organisms that is found to have a member of this gene family.  See for example: Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux and Genome sequence of the Antarctic rhodopsins-containing flavobacterium Gillisia limnaea for papers in which I was involved where Rhodopsins were part of the story.  Also see the Venter et al. Sargasso paper: PDF.  Anyway – just a quick post for those out there interested in rhodopsins and the like …

Evolutionary Biology of the Built Environment Working Group: Call for Participants

Call for participants: Evolutionary Biology of the Built Environment Working Group.  Details copied from the announcement pasted below:

The Basics: We need your help. We are organizing the first working group aimed at understanding the evolutionary biology of the built environment—our bedrooms, our houses, our backyards and our cities. This working group will occur June 10 – 14, 2013, in Durham, North Carolina. We are now inviting applications for participants in the working group.
Why: As recently as one hundred thousand years ago the indoor environment did not exist. Yet, this is now where most humans spend the majority of their life. One might imagine that in its relatively short history the built environment might have had time to accumulate very few species. Far from the case, an emerging body of literature shows that hundreds of multicellular species and thousands of unicellular species can be found in houses and buildings more generally. Among the species found in homes are those whose presence (or absence) is likely to have a large impact on human health and well-being, species including beneficial microbiota on the body but also pathogens and potential pathogens or toxic species such as extremophilic fungi. Yet, with the exception of a few deadly pathogens (such as MRSA), the evolutionary history of most of the species with which we most intimately interact in our homes remains unknown. To remedy our lack of knowledge and take advantage of recent advances in disparate fields we will bring together scientists studying both the fauna (microbiologists, entomologists, mammalogists, and any other -ologists you can convince us have some bearing on house biomes) and environment (engineers, architects) along with social scientists (anthropologists) and evolutionary biologists (e.g. theoreticians, bioinformaticians, geneticists) to begin to build a framework for the evolution of the indoor and more generally built biome. Our goal is to develop a framework for a comprehensive understanding of the evolution of the species we most intimately interact with, particularly in the context of considering how to build and design our environments so as to favor beneficial (rather than dangerous) evolutionary trajectories. We aim to understand both how to prevent the extinction of beneficial species and to favor the evolution of lineages and species with beneficial attributes, whether those be ecological functions, health benefits or simply aesthetic value.
Who: We’d like to convene a diverse group of scientists and practitioners at various stages in their careers, from graduate students and post-docs to senior scientists, representing an array of disciplines including the organismal -ologies (e.g. microbiology, entomology, etc.), engineering, architecture, anthropology, evolution, genetics, bioinformatics, art and design. We want to be inclusive of any field that you can convince us has something to bear on studying evolution in the built environment.
How: We are currently accepting applications to be part of this working group. If you are interested, you can apply online apply online here, but do so soon. We will select a group of 30 scholars and practitioners from the applicant pool who will meet in Durham with the goal of producing a series of general audience and peer-reviewed publications about the evolutionary biology of the built environment.
Sponsored by a partnership between the Sloan Foundation and the National Evolutionary Synthesis Center.
Have questions? Drop us a note at yourwildlife@gmail.com.

RIP Carl Woese: Collecting posts / notes / other information about my main science hero here

My tribute to Carl Woese 12/30/12

Sadly, Carl Woese has passed away.  I am collecting some links and posts about him here in his memory.  He was without a doubt the person who most influenced my career as a scientist.

News stories about Woese’s passing

Some of my posts about Woese

Woese Tree of Life pumpkin (by J. Eisen)

Storification of Tweets and other posts about his passing //storify.com/phylogenomics/rip-carl-woese.js?template=slideshow[View the story “RIP Carl Woese” on Storify]

Other posts worth reading about Woese’s passing

Some videos with Woese 





Miscellaneous

My graduate student Russell Neches used a laser to etch a picture of Carl Woese on a piece of toast.

http://www.mendeley.com/groups/2940711/papers-by-carl-woese/widget/21/3/

People not Projects: the Moore Foundation continues to revolutionize marine microbiology w/ its Investigator program

People not Projects.

It is such a simple concept.  But it is so powerful.  I first became aware of this idea as it relates to funding scientific research in regard to the Howard Hughes Medical Institute’s Investigator program.  Their approach (along with a decent chunk of money) has helped revolutionize biomedical science.  And thus I was personally thrilled to see the introduction of this concept in the area of Marine Microbiology a few years back with the Gordon and Betty Moore Foundation’s “Marine Microbiology Initiative Investigator” program.  Launched in 2004 it helped revolutionize marine microbiology studies in the same way HHMI’s investigator program revolutionized biomedical studies.

The first GBMF MMI Investigator program ran from 2004 -2012. And the people supported were pretty darn special:

Now I am I suppose a little biased in this because at the same time GBMF launched this program they also put a bunch of money into the general area of Marine Microbiology and I have been the recipient of some of that money.  For example, I got a small amount of money as part of the GBMF Funded work at the J. Craig Venter Institute on the Sargasso Sea and Global Ocean Sampling metagenomic sequencing projects and also had a subcontract from UCSD/JCVI to do some work as part of the “CAMERA” metagenomic database project.  I ended up being a coauthor on a diverse collection of papers associated with these projects including Sargasso metagenome and this review, and GOS1GOS2 and my stalking the 4th domain paper.

I am also a bit biased in that I have worked with many of the people on the initial MMI Investigator list some before, some after the awards including papers with Jen Martiny, Ed Delong, Alex Worden and Ginger Armbrust, and Mary Ann Moran.

But perhaps most relevant in terms of possible bias towards the Gordon and Betty Moore Foundation is that in 2007 my lab received funds through the MMI program for a collaborative project with Jessica Green and Katie Pollard for our “iSEEM” project on “Integrating Statistical, Ecological and Evolutionary analyses of Metagenomic Data” (see http://iseem.org) which was one of the most successful collaborations in which I have ever been involved.  This project produced something like a dozen papers and many major new developments in analyses of metagenomic data including 16S copy correction, sifting families, microbeDB, PD of metagenomes, WATERs, BioTorrents, AMPHORA. and STAP.  This project just ended but Katie Pollard and I just got additional funds from GBMF to continue related work.

So sure – I am biased.  But the program is simply great.  In the eight years since the initial grants the Gordon and Betty Moore Foundation has helped revolutionize marine microbiology.  And a lot of this came from the Investigator program and it’s emphasis on people not projects.  I note – the Moore Foundation has clearly decided that this “people not projects” concept is a good one.  A few years ago they partnered with HHMI to launch a Plant Sciences Investigator Program  which I wrote about here.

It was thus with great excitement that I saw the call for applications for the second round of the MMI Investigator program.  I certainly pondered applying.  But for many reasons I decided not to.  And today the winners of this competition have been announced and, well, it is an very impressive crew:

Some of the same crowd as the previous round.  Some new people.  Some people not there from the previous round.  All of them are rock stars in their areas especially if one takes into account how senior they are (the more junior people are stars in development).  And all have done groundbreaking work in various areas relating to marine microbiology.  The organisms covered here run the gamut including viruses, bacteria, archaea, and microbial eukaryotes.  The areas of focus covered range from biogeochemistry to ecosystem modeling with everything in between.  It really is an impressive group. Delong pioneered metagenomics and helped launch studies of uncultured microbes in the oceans.  Karl has led the Hawaii Ocean Time series and done other brilliant work.  Sullivan and Rohwer and pushing the frontiers of viral studies in the oceans.  Allen, Armbrust, and Worden are among the leaders in genomic studies of microbial eukaryotes in the marine environment.   Dubilier, Bidle, Fuhrman and Follows Stocker (double listed Follows in original post …) – though they focus on very different aspects of marine microbes – are helping lead the charge in understanding interactions across the domains of life in the marine environment.  Orphan, Saito, Deutsch, Follows and Pearson are on the cutting edge of biogeochemical studies and trying to link experimental studies of microbes to biogeochemistry of oceans.

The great thing about the “people not projects” concept is that the people funded here get to follow their own path.  They are not going to be constrained by the complications and sometime idiocy of the grant review process.  They in essence get to do whatever they want.  Freedom to follow their noses.  Or their guts.  Or whatever.  It is a refreshing concept and as mentioned above has been revolutionary in various areas of science.  There has been a slow but steady spread of the “people not projects” concept to various federal agencies too but it seems to be more of a private foundation type of strategy.  Federal Agencies are so risk averse in funding that this type of concept does not work well there.  I wish there was more.  But I am at least thankful for what HHMI and GBMF and Wellcome and Sloan and other private groups are doing in this regard.  Now – sure – all of these private foundations do not do everything perfectly.  They have blunders here and there like everyone else.  But without a doubt I think we need more of the People not Projects concept.
Oh – and another good thing.  GBMF is quite a big supporter of Open Science in it’s various guises.  So one can expect much of the data, software, and papers from their funding to be widely and openly available.   
It is a grand time to be doing microbiology largely due to revolutions in technology and also to changes in the way we view microbes on the planet.  It is an even grander time to be doing marine microbiology due to the dedication of the Gordon and Betty Moore Foundation to this important topic.  

Twisted tree of life award #14: @nytimes and Nathaniel Rich on Immortal Jellyfish

Well, this article by Nathaniel Rich in today’s New York Times Magazine certainly has gotten people talking: Can a Jellyfish Unlock the Secret of Immortality? – .  Alas, from a scientific point of view there are numerous problems with it.  So many that Paul Raeburn at the Knight Science Journalism Tracker at MIT has published a major takedown: First we get proof of heaven; now the secret of immortality. 
Now, the science about immortality in the article is certainly bad.  But that is not what I am here to discuss.  I am here to discuss the parts of the article about evolution.  I suppose if I had read the article online instead of in print I might have been attuned already to potential evolution problems from the correction on the first page

This article has been revised to reflect the following correction:
Correction: November 29, 2012
An earlier version of this article misstated the title of Charles Darwin’s classic book on the subject of evolution. It is “On The Origin of Species,” not “On the Origin of the Species.”

Oops.  Not a good start.  The article has a lot of background about jellyfish and in particular on person who is studying them and claiming this one species is immortal (which it is not).  It is the higher vs. lower organism meme that drives me crazy in the article:

Today the outermost twigs and buds of the Tree of Life are occupied by mammals and birds, while at the base of the trunk lie the most primitive phyla — Porifera (sponges), Platyhelminthes (flatworms), Cnidaria (jellyfish).

And then 

The mystery of life is not concealed in the higher animals,” Kubota told me. “It is concealed in the root. And at the root of the Tree of Life is the jellyfish. 

Seriously?  The root of the tree of life is the jellyfish?  And higher vs. lower organisms?  What exactly is a higher organism?  Does this mean that jellyfish have not evolved since their branch separate from the trunk of the animal tree?  Oh – and – what about the rest of the Tree of Life – you know – outside of animals for example?  Aaargh.  
The higher vs. lower meme continues with this quote:

Hydrozoans, he suggests, may have made a devil’s bargain. In exchange for simplicity — no head or tail, no vision, eating out of its own anus — they gained immortality.

Really?  So there is a tradeoff between complexity and immortality?  So does this mean all simple organisms are more immortal?  And all complex ones are doomed?  Where does this notion even come from?
For helping perpetuate the higher vs. lower organism meme (which drives me batty) I am awarding the author and the editor and the NY Times my coveted “Twisted Tree of Life” award.


As an aside, the article is littered with painful other statements like

It is possible to imagine a distant future in which most other species of life are extinct but the ocean will consist overwhelmingly of immortal jellyfish, a great gelatin consciousness everlasting.

So – this jellyfish operates in the absence of an ecosystem?  Suppose individual organisms are “immortal” as claimed in the article.  What exactly will they eat when everything else is gone?
Plus there is a conspiracy part that is lame.

You might expect that biotech multinationals would vie to copyright its genome; that a vast coalition of research scientists would seek to determine the mechanisms by which its cells aged in reverse; that pharmaceutical firms would try to appropriate its lessons for the purposes of human medicine; that governments would broker international accords to govern the future use of rejuvenating technology. But none of this happened.

Really?  So all the scientists and companies of the world have ignored this amazing finding?  Maybe, just maybe you might think that is because this is BOGUS?
And then there is the bogus “small bodied organism” problem.

He cited this as an example of a phenomenon he calls the Small’s Rule: small-bodied organisms are poorly studied relative to larger-bodied organisms. There are significantly more crab experts, for instance, than hydroid experts.

What?  Is this even remotely serious?  So ignore Drosophila as a model for animals.  Or mice for that matter.  Ignore Arabidopsis as a model for plants.  Ignore yeast too.  And E. coli.  Uggh.  Completely inane. 

Convoluted title, cool paper in #PLoSGenetics on relative of insect mutualists causing a human infection

Saw this tweet a few minutes ago:

//platform.twitter.com/widgets.js
The title of the paper took me a reread or two to understand.  But once I got what they were trying to say I was intrigued.  And so I went to the paper:  PLOS Genetics: A Novel Human-Infection-Derived Bacterium Provides Insights into the Evolutionary Origins of Mutualistic Insect–Bacterial Symbioses.  And it is loaded with interesting tidbits.  First, the first section of the results details the history of the infection in a 71 year old male and his recovery and the isolation and characterization of a new bacterial strain.  Phylogenetic analysis revealed this was a close relative of the Sodalis endosymbionts of insects.

And then comparative genomics revealed a bit more detail about the history of this strain, it’s relatives, and some of the insect endosymbionts.  And plus, it allowed the authors to make some jazzy figures such as

And this and other comparative analyses revealed some interesting findings.  As summarize by the authors

Our results indicate that ancestral relatives of strain HS have served as progenitors for the independent descent of Sodalis-allied endosymbionts found in several insect hosts. Comparative analyses indicate that the gene inventories of the insect endosymbionts were independently derived from a common ancestral template through a combination of irreversible degenerative changes. Our results provide compelling support for the notion that mutualists evolve from pathogenic progenitors. They also elucidate the role of degenerative evolutionary processes in shaping the gene inventories of symbiotic bacteria at a very early stage in these mutualistic associations.

The paper is definitely worth a look.