Jonathan Eisen

Microbiology blog video …

Just got this from Jenna Morgan. I guess I did miss something in Toronto in 2007 …

Worst New Omics Word Award: Museomics

OK. So I coined my my omics word many years ago (phylogenomics). Fine. Sue me. But the spread of omics words is getting really icky. And a new one really seems lame. The word is “museomics”which I saw for the first time in a press release today from Cold Spring Harbor Press about a paper in Genome Research.

I mean, the study of the Tasmanian Tiger seems like it could be interesting (have not read the paper) and there is some really fun stuff happening these days using Roche/454 sequencing.

But meseomics? Not to disparage museums which are critical to all of biology in my opinion. But to me the term in a way treats museums as simply a place we store organisms before we get DNA out of them. For this, the team at Penn St. that led this project and apparently coined the term museomics (see here where they define the term) is getting my new “Worst New Omics Word Award.”

In addition I am proposing my favorite new Omics words using their model:

Roadkillomics (to go along with roadside field guides)
Backyardomics (e.g., could be some sort of native plant thing)
Hospitalomics (e.g., MRSA)
Backoftheenvelopomics (for the anthrax case)
Stuffinmypocketomics (hey, I have found some $&%$ stuff there)
Restaurantomics (e.g., O157H7)
Footballomics (there have been studies of MRSA transmission in games, why not omics)
Slowfoodomics (genomics of things you get within 50 miles of your neighborhood)
Ebayomics (genomics of things you get off Ebay)
Stuffthecatdraggedinomics (my cats would like this)
Wherethesundontshinomics (human microbiome?)

Any others suggestions?

Calling all phylogeneticists – we need your help with metagenomic data

I have decided to post a question here to my blog requesting help from phylogeneticists everywhere in doing phylogenetic analysis of data from metagenomic projects. Here I will try to describe the problem and then hopefully people out there can chime in on what they think we/others should do to handle this type of data.

So here is the deal. We would like to perform a variety of phylogenetic analyses of data from “environmental shotgun sequencing (ESS)” projects in which one isolates DNA from an environmental sample (e.g., soil, water) and then randomly sequences fragments of that DNA. ESS is in essence a subset of “metagenomics” which is basically the study of the genomes of organisms from environmental samples. (I wrote a brief piece on ESS in PLoS Biology last year which can be found here).

Though there are lots of things we would like to do with phylogenetic analysis of this type of data, I am going to focus here on one specific thing. We would like to take sequence reads that contain matches to specific gene/gene family (e.g., RecA, my favorite gene), build a multple sequence alignment that includes these reads as well as all members of this gene family from known organisms, and then build phylogenetic trees from these alignments. (And by we here I mean like totally lots of people, incliding in particular a Gordon and Betty Moore Foundation funded project called iSEEM I am working on with the labs of Katie Pollard and Jessica Green)

The challenge with this is really two things. First, we want to analyze just the reads themselves (i.e., we do not want to use assemblies you can make from this type of data). Second, and more importantly, we want to include in our analysis sequence reads that only cover small, not necessarily overlapping regions of the “full length” sequence alignments for the family.

The alignment would look something like

sequence 1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
fragment 1 XXXXXXXXX————————-
sequence 2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
fragment 2 ———XXXXXXXXXXXX————-
fragment 3 ———————XXXXXXXXXXXXX
fragment 4 —-XXXXXXXXXXXXXXXXXX————
sequence 3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
sequence 4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
sequence 5 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
fragment 5 ———————–XXXXXXXXXXX-

where Xs are the regions covered by the sequences/fragments (could be DNA or amino acids)

We want to build trees from these alignments with the hope of using them to learn lots of cool things about the evolution of the fragments and the species from which they come. I can provide more information but really the key part for the phylogenetics here is the nature of the alignment.

In the past, I have decided to constrain my analyses to NOT deal with this type of alignments. I have either analyzed each fragment on its own or we have built a multiple alignment but only inlcuded fragments that cover more than 3/4 of the full length sequence and thus the matrix is much more filled out. Such an alignment would look like this

sequence 1 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
fragment 1 XXXXXXXXXXXXXXXXXXXXXXXXXXX——-
sequence 2 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
fragment 2 –XXXXXXXXXXXXXXXXXXXXXXXX——–
fragment 3 —–XXXXXXXXXXXXXXXXXXXXXXXXXXXXX
fragment 4 —-XXXXXXXXXXXXXXXXXXXXXXXXXXXX–
sequence 3 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
sequence 4 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
sequence 5 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
fragment 5 –XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX-

But we really want to include the smaller fragments in our analysis. And we are just not certain how to best do this. We know LOTs of people out there think of similar problems in terms of sparse matrices, supermatrices, supertrees, EST data, etc. And we have ideas about how to do this and are asking around by email some phylogenetics gurus we know. But I thought it might be fun to have the discussion on a blog rather than by email.

So again, how might one best build phylogenetic trees from data that looks like this?

And from these trees we want to place each fragment relative to (1) the full length sequences and (2) to each other if possible. We also, of course, want branch lengths to reflect some sort of amount of evolution and thus do not just want a cladogram.

Any suggestions would be appreciated. Fire away with questions too …

Computational Biologists bring together some of the "best jobs" in the US

Check out the WSJ article on “The Best and Worst Jobs in the U.S. – WSJ.com”

The top 5 are
1. Mathematician
2. Actuary
3. Statistician
4. Biologist
5. Software Engineer

Seems like I know a fair # of people who combine #1, #3, #4 and #5 although rarely in equal amounts. I am not sure if anyone out there combines all of the top 5 but there must be some scientist/actuaries doing this computational biology right? Seems like a pretty strange list to me in some ways, but I must say, being a mathematically inclined computational biologist is pretty fun. Now if I only knew statistics …

Hat tip to Lior Pachter for posting this to Facebook where I found it.

Acid Rock Bacteria Genome …

Just a little plug for a new paper of which I am a co-author. This is a report on the analysis of the genome sequence of Acidithiobacillus ferrooxidans which was just published in BMC Genomics (an open access journal, by the way). This paper was a long long time coming – the genome was sequenced when I was at TIGR many years ago (Herve Tettelin coordinated most of the work). Since I was interested in the biology of this bug I volunteered to help turn the sequence into a paper, but was pretty lame about doing that. Thankfully David Holmes and Jorge Valdes in Chile helped make a paper from the data and much additional analyses. Here is the abstract:

Background
Acidithiobacillus ferrooxidans is a major participant in consortia of microorganisms used for the industrial recovery of copper (bioleaching or biomining). It is a chemolithoautrophic, γ-proteobacterium using energy from the oxidation of iron- and sulfur-containing minerals for growth. It thrives at extremely low pH (pH 1–2) and fixes both carbon and nitrogen from the atmosphere. It solubilizes copper and other metals from rocks and plays an important role in nutrient and metal biogeochemical cycling in acid environments. The lack of a well-developed system for genetic manipulation has prevented thorough exploration of its physiology. Also, confusion has been caused by prior metabolic models constructed based upon the examination of multiple, and sometimes distantly related, strains of the microorganism.

Results
The genome of the type strain A. ferrooxidans ATCC 23270 was sequenced and annotated to identify general features and provide a framework for in silico metabolic reconstruction. Earlier models of iron and sulfur oxidation, biofilm formation, quorum sensing, inorganic ion uptake, and amino acid metabolism are confirmed and extended. Initial models are presented for central carbon metabolism, anaerobic metabolism (including sulfur reduction, hydrogen metabolism and nitrogen fixation), stress responses, DNA repair, and metal and toxic compound fluxes.

Conclusion
Bioinformatics analysis provides a valuable platform for gene discovery and functional prediction that helps explain the activity of A. ferrooxidans in industrial bioleaching and its role as a primary producer in acidic environments. An analysis of the genome of the type strain provides a coherent view of its gene content and metabolic potential.

Stan Falkow, only 74 and getting more famous by the day

Nice article in USA Today about Stan Falkow focusing in part on his Lasker Award. Good to see him continue to get some props as he, well, rocks. Note – I wrote about him getting a Lasker Award four months ago here but maybe I was too early?

Thanks "The Open Lab"

Very happy to get this email:

Many congratulations that your post (check http://scienceblogs.com/clock/2009/01/the_open_laboratory_2008_and_t.php for which one) was selected to be part of this year’s print anthology of the best science blogging on the web.

Check out the collection at the link. There is some fun stuff there. I was selected for what else, my April Fools prank about brain doping. On the one hand, I wish something I wrote about science or policy was picked. On the other hand, I consider this April 1 joke of the best things I have done on the web …

Ad for Genomics Faculty Position at UC Davis

Still getting back into things after being out sick … here is an Ad for a job everyone should want …

The UC Davis Genome Center integrates experimental and computational approaches to address key problems at the forefront of genomics. The Center is housed in a new research building with state-of-the-art computational and laboratory facilities and currently comprises 14 experimental and computational faculty. These faculty are developing an internationally recognized program in genomics and computational biology at Davis, building on and enhancing the unique strengths and unmatched breadth of the life sciences on the UC Davis campus.

The Genome Center invites applications for tenure-track faculty positions in all areas of genomics with emphasis on next-generation proteomics and statistical genomics involving animal, plant or microbial systems. Applicants interested in genomic approaches to human diseases and investigators employing large-scale, technology-driven approaches that complement existing strengths at UC Davis are particularly encouraged to apply. Candidates should be strongly motivated by the biological importance of their research and should value the opportunity to work in close collaboration with other groups and disciplines.

Candidates may be at any academic level. At the senior level, we invite applications from prominent scientists with distinguished records of research, teaching, and leadership in genomics. At the junior level, we invite applications from candidates whose accomplishments in innovative research and commitments to teaching demonstrate their potential to develop into the future leaders in these fields.

These positions require a Ph.D. or equivalent. Appointments will be at the Assistant, Associate or Full Professor level in an appropriate academic department in any of six schools, or colleges. The position will remain open until filled. For fullest consideration, applicants should submit a letter of application, a curriculum vitae, statements of research and teaching interests, and the names of at least five references to the Genome Center Web site www.genomecenter.ucdavis.edu by January 15, 2009.

The University of California is an affirmative action/equal opportunity employer

What to do when your sick? Sickblogging (and a little bit about Adm. Dennis Blair)

Well, I have had some unpleasant winter bug that on top of everything my kids seems to have or at least have something similar. It has been fun at night here to say the least. I was hoping to get some work done over winter break especially since I was overwhelmed with teaching in the fall quarter. That is not happening. But in the few moments of peace here, I have looked for something to do — and hey there is one thing I could do with only a little time. Blogging. And of course I am not alone in this. So here are some links to others on sickblogging:

And what have I to say today? Not much but here is a preview of things to come. I have a feeling that Obama is stalking me scientifically. Why? Well, I am one or two steps removed from a huge number of his appointees and I plan to write about them in the next few weeks once I get better. One things the science bloggers might not have heard about is the passion Adm. Dennis Blair has for science. Dennis Blair is Obama’s pick for DNI (Director of National Intelligence) and I know him through a program called the Defense Science Studies Group (DSSG). I will write more about this later, but what I can say here is I think Blair is a brilliant pick by Obama. Not only does he have a strong military and intelligence background, but more importantly to me, he believes in evidence, and is a strong supporter of science. And below is a little pic of me getting my certificate from Adm. Blair.

Open Evolution Highlights – the Population Genetics of dN/dS

An interesting new paper in PLoS Genetics (PLoS Genetics: The Population Genetics of dN/dS) by Sergey Kryazhimskiy and Josh Plotkin that discusses the use of the widely used parameter dN/dS (in essence a measure of the ratio of non synonymous to synonymous substitutions in protein coding genes). This parameter is commonly used to estimate the type of selection that has occurred in a protein coding gene.

Here is their summary of their article:

Since the time of Darwin, biologists have worked to identify instances of evolutionary adaptation. At the molecular scale, it is understood that adaptation should induce more genetic changes at amino acid altering sites in the genome, compared to amino acid–preserving sites. The ratio of substitution rates at such sites, denoted dN/dS, is therefore commonly used to detect proteins undergoing adaptation. This test was originally developed for application to distantly diverged genetic sequences, the differences among which represent substitutions along independent evolutionary lineages. Nonetheless, the dN/dS statistics are also frequently applied to genetic sequences sampled from a single population, the differences among which represent transient polymorphisms, not substitutions. Here, we show that the behavior of the dN/dS statistic is very different in these two cases. In particular, when applied to sequences from a single population, the dN/dS ratio is relatively insensitive to the strength of natural selection, and the anticipated signature of adaptive evolution, dN/dS>1, is violated. These results have implications for the interpretation of genetic variation sampled from a population. In particular, these results suggest that microbes may experience substantially stronger selective forces than previously thought.

The key to me is that it seems that many may have been using dN/dS ratios inappropriately when comparing samples within a species. For more, well, see the paper.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: