Eisen Lab Blog

Germophobia 101: there are microbes on pacifiers; therefore pacifiers cause atherosclerosis & diabetes

Oh my God.  I hope upon hope that the quote in this story was unintended.  The story is from US News and World Report: Dirty Pacifiers May Make Infants Sick: Study – US News and World Report

It is excruciatingly painful to read.  First, the headline is misleading and way out of line.  US News should be reprimanded for this.  There is no evidence presented that pacifiers are making anyone sick.

What is the story about?  At a conference someone(s) presented results of taking used and new pacifiers and chopping them up and seeing what grew on different parts.  And they found – get ready – microbes on them.  And more microbes on the used ones than the new ones.  And they even found some microbes that were apparently resistant to antibiotics.

Scared yet?  You shouldn’t be because of course  Fortunately the story does quote on sane person

Dr. Ben Hoffman, medical director of the Children’s Safety Center at Oregon Health and Science University’s Doernbecher Children’s Hospital, said he can’t think of an infection a child has had that he would attribute to a pacifier. 

“The majority of things you’re going to find on a pacifier are things we’ll find on our clothes, normal human flora,” said Hoffman. “It’s not a reason to demonize pacifiers if people find them useful.”

But alas it also quotes the lead author of the study.

Glass doesn’t recommend that parents use pacifiers to calm their babies and toddlers. “After doing the study, I say why take a risk? The key is to recognize that pacifiers can cause illness,” he said. “In the long run, it may be that what you do now [using a pacifier] may have a lot to do with whether a child ends up developing atherosclerosis or type 2 diabetes.”

What? The? Fu$*#? Pacifiers have microbes on them.  Therefore they cause atherosclerosis and diabetes?   Completely, unbelievably insane and irresponsible.  And I think US News should have made it clearer that this is just completely out of line.

Thank you interwebs: help proving fungi are cool

Well, am teaching three lectures this week on Fungal Diversity for BIS002C at UC Davis. And I decided tonight to ask the internet for help finding cool new stories on fungi. And boy did the internet come through in the clutch. Thanks internet. See Storification of Twitter and Facebook discussions below:

http://storify.com/phylogenomics/fungi-are-cool.js[View the story “Fungi are cool” on Storify]

Fungi are cool

Storified by Jonathan Eisen · Sun, Nov 04 2012 23:13:14

Twitter Discussion after I asked for suggestions …
Prepping 3 lectures on Fungal Diversity for Intro Bio class at #UCDavis – looking for suggestions for coolest recent fungal stories/studiesJonathan Eisen
@phylogenomics Saccharomyces eubayanus & the new world origin of lager yeast…jashapiro
@phylogenomics well, in the news, jump in fungi in metagenome studies in gulf post BP spill.Kenneth Bruno
@phylogenomics the obvious 1 is the fungal meningitis outbreakKitt Klaiss
@jashapiro reference?Jonathan Eisen
@phylogenomics This looks cool. I’m concerned that intro bio doesn’t have more micro, earlier. Is your syllabus available?Mark O. Martin
@KSBruno9 reference?Jonathan Eisen
@phylogenomics you can discuss Aspergillus around those iatrogenic fungal meningidites…Doctor_Strange
@phylogenomics And I love the whole story on fungi developing ability to degrade lignin leading to end of coal deposits.Kenneth Bruno
@phylogenomics I posted on FB, will get.Kenneth Bruno
@StrangeSource already on the listJonathan Eisen
Microbe domestication and the identification of the wild genetic stock of lager-brewing yeastAbstract Domestication of plants and animals promoted humanity’s transition from nomadic to sedentary lifestyles, demographic expansion, …
UW-Madison: University Communications: News PhotosPhotographs are available to media organizations and University of Wisconsin-Madison departments for news, editorial and public relations…
PLOS ONE: Dramatic Shifts in Benthic Microbial Eukaryote Communities following the Deepwater Horizon Oil SpillPLOS ONE: an inclusive, peer-reviewed, open-access resource from the PUBLIC LIBRARY OF SCIENCE. Reports of well-performed scientific stud…
@phylogenomics I’m fond of Scott Strobel’s endophyte course and the discoveries they’re making w undergrads: http://news.yale.edu/2012/10/25/science-magazine-lauds-yale-science-discovery-courseJeramia Ory
Science Magazine lauds Yale science discovery courseAn innovative Yale science course that encourages undergraduates to discover and study plant-associated organisms has been recognized by …
@KSBruno9 heh – that is by @dr_bik … post doc in my lab …Jonathan Eisen
@phylogenomics @Dr_Bik cool, thought it was a great story. I’m from Louisiana so it was all close to home for me.Kenneth Bruno
@phylogenomics Hey, thank you! My students get taught (not by me) that bacteria are simple, don’t have cytoskeletons, no compartments, etc.Mark O. Martin
@phylogenomics @Dr_Bik Not to say that such a huge shift is necessarily a good thing…Kenneth Bruno
Biodiversity of FungiBiodiversity of Fungi is essential for anyone collecting and/or monitoring any fungi. Fascinating and beautiful, fungi are vital componen…
@phylogenomics Other thing that comes to mind is huge trend toward genome mining for secondary metabolites. Chemistry and genetics.Kenneth Bruno
World’s unique battle -Powerful Arthobortrys fungal adhesive employed to capture soil nematodenandkamat
World’s First video on soil nematode trapped by Drechslerella anchonia mycoadhesivenandkamat
@phylogenomics the responses u r getting is why twitter is cool now will spend part of today reading bout cool fungal researchSponch
May show this video by Louie Schwartzberg on fungi for #UCDavis class this week http://www.youtube.com/watch?v=EDkR2HIlEbc&feature=plcp (although will probably use w/o sound)Jonathan Eisen
Fantastic Fungi: The Forbidden Fruitlouieschwartzberg
@Sponch2 ain’t that the truthJonathan Eisen
@phylogenomics Here is a paleontological fungal/liverwort controversy! http://rlebling.blogspot.com/2012/06/when-funguses-ruled-earth.htmlMark O. Martin
A Strange Manuscript: When Giant Funguses Ruled the EarthAbout 400 million years ago, during the Devonian period, the world was a very strange place. Green plant life had begun to cover the land…
@phylogenomics Not totally fungal, but horizontal transfer of carotenoid production from fungi to aphids is cool. http://www.ncbi.nlm.nih.gov/pubmed/20431015jashapiro
Lateral transfer of genes from fungi underlies carot… [Science. 2010] – PubMed – NCBIPubMed comprises more than 22 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citation…
@jashapiro already on my list to cover .. I taught about this last few years …Jonathan Eisen
@phylogenomics Great story for a diversity course…jashapiro
Fermentation Guru Seeks Out New (and Old) Flavors”Oh, this is nice kimchi,” he said on a summer afternoon at Momofuku Noodle Bar, using chopsticks to pull crimson-coated knuckles of Napa…
@DrLabRatOry @phylogenomics I know the TAs of the class. Could get you in touch for skyping them in or send you pics from the field tripsDenina Hospodsky
Natural Products Version 2.0: Connecting Genes to Molecules – Journal of the American Chemical Society (ACS Publications)Abstract Natural products have played a prominent role in the history of organic chemistry, and they continue to be important as drugs, b…
Exploiting plug-and-play synthetic biology for drug discovery and production in microorganisms : Abstract : Nature Reviews MicrobiologyOne of the most promising applications of synthetic biology is the biosynthesis of new drugs from secondary metabolites. Here, we survey …
Fasebook Discussion after I asked for suggestions …
Prepping 3 lectures… | FacebookJonathan Eisen wrote: Prepping 3 lectures on Fungal Diversity… Join Facebook to connect with Jonathan Eisen and others you may know.
I would go for toe jam–that’s always popular.Amy Propps
well, I already talked about fecal transplants a few weeks ago … I think I am going to avoid the gross/semi gross this timeJonathan Eisen
Other than the contaminated steroids?Joanne Manaster
Ooo, that’s a good one!Amy Propps
New Ancient Fungus Finding Suggests World’s Forests Were Wiped Out In Global CatastropheTiny organisms that covered the planet more than 250 million years ago appear to be a species of ancient fungus that thrived in dead wood…
Joanne – will cover the steroids w/o a doubt .. but I want MORE …Jonathan Eisen
Tut Shares Tomb with Former Fungi: Scientific American PodcastBrown stains on the walls of Tut’s tomb are fungal mats, indicating a hurried burial. Cynthia Graber reports The tomb of King Tutenkhamen…
A lot of news coming out of the UK just now about threat to ash trees from Chalara: http://www.bbc.co.uk/news/science-environment-20128172.Neil Saunders
ooh – King Tut fungus ..Jonathan Eisen
Chalara ash dieback outbreak: Q&AThe recent confirmed cases of Chalara ash dieback means it has become the latest threat to UK trees. Within the UK’s woodlands, ash is th…
That article looks like a stub, don’t know if there’s more to it.Amy Propps
Insight into trade-off between wood decay and parasitism from the genome of a fungal forest pathogen – Olson – 2012 – New Phytologist – Wiley Online Library
It’s hard to go past the Cordyceps “zombie fungus” 🙂 Video – http://www.youtube.com/watch?v=XuKjBIBBAL8 and article – http://www.wired.com/wiredscience/2011/03/zombifying-ant-fungus/.Neil Saunders
4 New Species of Zombifying Ant Fungus Found | Wired Science | Wired.comSee Also: Citation: “Hidden diversity behind the Zombie-Ant fungus Ophiocordyceps unilateralis: Four new species described from Carpenter…
Cordyceps: attack of the killer fungi – Planet Earth Attenborough BBC wildlifebbcworldwide
Bacterial-Fungal Interactions: Hyphens between Agricultural, Clinical, Environmental, and Food MicrobiologistsSummary: Bacteria and fungi can form a range of physical associations that depend on various modes of molecular communication for their d…
I think mimicry examples are cool – fungus making a pheremone http://www.pnas.org/content/104/20/8374; Gibberellin, a plant hormone, is also produced by fusarium http://www.plant-hormones.info/gibberellins.htm; Carrion smelling fungi attract flies to spread spores http://www.mapoflife.org/topics/topic_422_Mimicry-in-fungi/;Jason Stajich
Multitrophic interaction facilitates parasite-host relationship between an invasive beetle and the honey beeInternational Centre of Insect Physiology and Ecology, P.O. Box 30772-00100, Nairobi, Kenya; ‡Institute of Food and Agricultural Sciences…
Plant Hormones GibberellinsGibberellins are diterpenes synthesized from acetyl CoA via the mevalonic acid pathway. They all have either 19 or 20 carbon units groupe…
“Mimicry in fungi” : Map of LifeMost of us are familiar with fungi in the form of mushrooms, some of which are brightly coloured and not likely to be mistaken for anythi…
Fungus + Virus = heat tolerance for thermophilic grasses http://www.ncbi.nlm.nih.gov/pubmed/12446900 and http://www.sciencemag.org/content/315/5811/513Jason Stajich
Thermotolerance generated by plant/fungal symbiosis. [Science. 2002] – PubMed – NCBIPubMed comprises more than 22 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citation…
A Virus in a Fungus in a Plant: Three-Way Symbiosis Required for Thermal ToleranceA mutualistic association between a fungal endophyte and a tropical panic grass allows both organisms to grow at high soil temperatures. …
It’s not new, but there’s a lot of good population biology/epidemiology on sudden oak death (Phytophthera ramorum)–especially the function of landscaping plants as vectors.If you want to use human health as a hook, you can’t go wrong with fungal sec…See MoreKen Callicott
Ken – phytophtera is not a fungus though …Jonathan Eisen
That’s right! I keep forgetting that work that showed them completely unrelated (well, you know what I mean) to the actual fungi–the curse of spending too much time around people who refer to them as water molds.Ken Callicott
look this video Jonathan Eisen….it’s amazing.. http://www.youtube.com/watch?v=Pq1x5V2-3w0Felipe Gainza Cortes
BBC Planet Earth Cordyceps Fungus Finding of the holy mushroom – Diknek lorrie’sdnlscratcher
The deadly chytrid fungus: a story of an emergin… [PLoS Pathog. 2010] – PubMed – NCBIPubMed comprises more than 22 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citation…
underwater fruiting Psathyrella in Oregon discoved 3 years ago: http://2pat.files.wordpress.com/2011/05/underwater-mushroom-oregon.jpgDamon Tighe
WordPress
From a business side of things: 1) Using Oyster mushrooms to treat terrestial oil spills 2) BTTR in Emeryville that uses the left over coffee grounds to produce oyster mushrooms from whole foods etc 3) Eben Bayer’s TED talk on using mycellium to make packaging materials 4) Staments TED talk…fungus as pesticide replacement. 5) Cordyceps sinensis – Asia’s viagra, easily one of the most expensive mushrooms in the world and the hunt for them is causing major environmental problems in India the past summers.Damon Tighe
Eben Bayer: Are mushrooms the new plastic? | Video on TED.comTED Talks Product designer Eben Bayer reveals his recipe for a new, fungus- based packaging material that protects fragile stuff like fu…
Paul Stamets: 6 ways mushrooms can save the world | Video on TED.comTED
John Taylor’s recent reverse ecology study of NeurosporaMatthew Kane
Cambridge librarian finds forgotten fungus Charles Darwin brought back on the Beagle (and it was still wrapped in his newspaper)Fungi and seaweed collected by Charles Darwin on the Beagle Voyage has been uncovered wrapped in newspaper in a Cambridge University libr…
Also James Scott’s work on the angel’s share fungus http://www.wired.com/magazine/2011/05/ff_angelsshare/ and for the future lawyers in the room, what happens when fungal growth can be attributed to the distillers and thus they can be sued for this as …See MoreJason Stajich
The Mystery of the Canadian Whiskey Fungus | Wired Magazine | Wired.comThe air outside a distillery warehouse smells like witch hazel and spices, with notes of candied fruit and vanilla-warm and tangy- mellow…

Quick post: nice microbial genomes database: MGBD (hat tip to Google Scholar Updates)

Just discovered this paper: MBGD update 2013: the microbial genome database for exploring the diversity of microbial world.  Seems to be a useful microbial genomes database with some nice associated tools.  Among the potentially useful features:

General Ortholog Table
Select your own organisms for a Custom Ortholog Table
Add your own genome in My MBGD Mode

And more.  Anyway – worth checking out.

I note – I found out about this via Google Scholar Updates:

For more on Scholar Updates see here.

You win some, you lose some

Our project is starting to pick up! After our initial sampling/sequencing period, we realized that there is actual DNA we can work with from the tanks. This past week, we started our actual sample collecting from the tropical tank. We collected 3 sets of samples from the sediment, walls, and water. Throughout the week, we extracted the DNA and ran PCR on all 9 samples (plus one negative control). Today, we completed the gel electrophoresis and got some unpleasant results. Unfortunately, we couldn’t see the primer bands and the DNA bands didn’t show up like we thought they would. This means something went wrong in our PCR, but we don’t know if it was during PCR16SA or PCR16SB. Well, it’s back to the drawing board! Starting next week, we will be re-running the PCR on the 9 samples and possible collecting more samples from other tanks.

Although this week’s results were a bust, we know that there is definitely some DNA present that we can work with. I’m sure we’ll be finding some pretty cool things as we continue sampling and sequencing. 🙂

Story behind the Paper: Functional biogeography of ocean microbes

Guest Post by Russell Neches, a PhD Student in my lab and Co-Author on a new paper in PLoS One.  Some minor edits by me.


For this installment of the Story Behind the Paper, I’m going to discuss a paper we recently published in which we investigated the geographic distribution of protein function among the world’s oceans. The paper, Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization, came out in PLOS ONE in September, and was a collaboration among Xingpeng Jiang (McMaster, now at Drexel), Morgan Langille (UC Davis, now at Dalhousie), myself (UC Davis), Marie Elliot (McMaster), Simon Levin (Princeton), Jonathan Eisen (my adviser, UC Davis), Joshua Weitz (Georgia Tech) and Jonathan Dushoff (McMaster).

Using projections to “see” patterns in complex biological data

Biology is notorious for its exuberant abundance of factors, and one of its central challenges is to discover which among a large group of factors are important for a given question. For this reason, biologists spend a lot of time looking at tables that might resemble this one :

sample A
sample B
sample C
factor 1
3.3
5.1
0.3
factor 2
1.1
9.3
0.1
factor 3
17.1
32.0
93.1


Which factors are important? Which differences among samples are important? There are a variety of mathematical tools that can help distill these tables into something perhaps more tractable to interpretation. One way or another, all of these tools work by decomposing the data into vectors and projecting them into a lower dimensional space, much the way object casts a shadow onto a surface. 


The idea is to find a projection that highlights an important feature of the original data. For example, the projection of the fire hydrant onto the pavement highlights its bilateral symmetry.

So, projections are very useful. Many people have a favorite projection, and like to apply the same one to every bunch of data they encounter. This is better than just staring at the raw data, but different data and different effects lend themselves to different projections. It would be better if people generalized their thinking a little bit.

When you make a projection, you really have three choices. First, you have to choose how the data fits into the original space. There is more than one valid way of thinking about this. You could think about it as arranging the elements into vectors, or deciding what “reshuffling” operations are allowed. Then, you have to choose what kind of projection you want to make. Usually people stick with some flavor of linear transformation. Last, you have to choose the space you want to make your projection into. What dimensions should it have? What relationship should it have with the original space? How should it be oriented?

In the photograph of the fire hydrant, the original data (the fire hydrant) is embedded in a three dimensional space, and projected onto the ground (the lower dimensional space) by the sunlight by casting a shadow. The ground happens to be roughly orthogonal to the fire hydrant, and the sunlight happens to fall from a certain angle. But perhaps this is not the ideal projection. Maybe we’d get a more informative projection if we put a vertical screen behind the fire hydrant, and used a floodlight? Then we’d be doing the same transformation on the same representation of the data, but into a space with a different orientation. Suppose we could make the fire hydrant semi-transparent, we placed it inside a tube-shaped screen, and illuminated the fire hydrant from within? Then we’d be using a different representation of the original data, and we’d be doing a non-linear projection into an entirely different space with a different relationship with the original space. Cool, huh?

It’s important to think generally when choosing a projection. When you start trying to tease some meaning out of a big data set, the choice of principal component analysis, or k-means clustering, or canonical correlation analysis, or support vector machines, has important implications for what you will (or won’t) be able to see.

How we started this collaboration: a DARPA project named FunBio

Between 2005 and 2011, DARPA had a program humbly named The Fundamental Laws of Biology (FunBio). The idea was to foster collaborations among mathematicians, experimental biologists, physicists, and theoretical biologists — many of whom already bridged the gap between modeling and experiment. Simon Levin was the PI of the project and Benjamin Mann was the program officer. The group was large enough to have a number of subgroups that included theorists and empiricists, including a group focused on ecology. Jonathan Eisen was the empiricist for microbial ecology, and was very interested in the binning problem for metagenomics; that is, classifying reads, usually by taxonomy. Conversations in and out of the program facilitated the parallel development of two methods in this area: LikelyBin (led by Andrey Kislyuk and Joshua Weitz with contributions from Srijak Bhatnagar and Jonathan Dushoff) and CompostBin (led by Sourav Chatterji and Jonathan Eisen with contributions from collaborators). At this stage, the focus was more on methods than biological discoveries.

The binning problem highlights some fascinating computational and biological questions, but as the program developed, the group began to tilt in direction of biological problems. For example, Simon Levin was interested in the question: Could we identify certain parts of the ocean that are enriched for markers of social behavior?

One of the key figures in any field guide is a ecosystem map. These maps are the starting point from which a researcher can orient themselves when studying an ecosystem by placing their observations in context. 

Handbook of Birds of the Western United States, Florence Merriam Bailey 
There are a variety of approaches one could take that entail deep questions about how best to describe variation in taxonomy and function. For example, we could try to find “canonical” examples of each ecosystem, and then perhaps identify intermediates between them. Similarly, we could try and find “canonical” examples of the way different functions are distributed across ecosystem and identify intermediates between them.

In the discussions that followed, we discussed how to describe the functional and taxonomic diversity in a community as revealed via metagenomics; that is, how do we describe, identify and categorize ecosystems and associated function? In order to answer this question, we had to confront a difficult issue: how to quantify and analyze metagenomic profile data.

Metagenomic profile data: making sense of complexity at the microbe scale

Metagenomics is perhaps the most pervasive cause of the proliferation of giant tables of data that now beset biology. These tables may represent the proportion of taxa at different sites, e.g., as measured across a transect using effective taxonomic units as proxies for distinct taxa. Among these giant tables, one of the challenges that has been brought to light is that there can be a great deal of gene content variability among individuals of an individual taxa. As a consequence, obtaining the taxonomic identities of organisms in an ecosystem is not sufficient to characterize the biological functions present in that community. Furthermore, ecologists have long known that there are often many organisms that could potentially occupy a particular ecological niche. Thus, using taxonomy as a proxy for function can lead to trouble in two different ways; the organism you’ve found might be doing something very different from what it usually does, and second, the absence of an organism that usually performs a particular function does not necessarily imply the absence of that function. So, it’s rather important to look directly at the genes in the ecosystem, rather than taking proxies. You can see where this is going, I’m sure: Metagenomics, the cure for the problems raised by metagenomics!

When investigating these ecological problems, it is easy to take for granted the ability to distinguish one type of environment from another. After all, if you were to wander from a desert to a jungle, or from forest to tundra, you can tell just by looking around what kind of ecosystem you are in (at least approximately). Or, if the ecosystems themselves are new to you, it should at least be possible to notice when one has stepped from one into another. However, there is a strong anthropic bias operating here, because not all ecosystems are visible on humans scales. So, how do you distinguish one ecosystem from another if you can’t see either?

One way is to look at the taxa present, but that works best if you are already somewhat familiar with that ecosystem. Another way is to look at the general properties of the ecosystem. With microbial ecosystems, we can look at predicted gene functions. Once again, this line of reasoning points to metagenomics.

We wanted to use a projection method that avoids drawing hard boundaries, reasoning that hard boundaries can lead to misleading results due to over-specification. Moreover, in doing so Jonathan Dushoff advocated for a method that had the benefits of “positivity”, i.e., the projection would be done in a space where the components and their weights were positive, consistent with the data, and which would help the interpretability of our results. This is the central reason why we wanted to use an alternative to PCA. The method Jonathan Dushoff suggested was Non-negative Matrix Factorization (NMF). This choice led to a number of debates and discussions, in part, because NMF is not a “standard” method (yet). Though, it has seen increasing use within computational biology: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000029. It is worth talking about these issues to help contextualize the results we did find.

The Non-negative Matrix Factorization (NMF) approach to projection

The conceptual idea underlying NMF (and a few other dimensional reduction methods) is a projection that allows entities to exist in multiple categories. This turns out to be quite important for handling corner cases. If you’ve ever tried to build a library, you’ve probably encountered this problem. For example, you’ve probably created categories like Jazz, Blues, Rock, Classical and Hip Hop. Inevitably, you find artists who don’t fit into the scheme. Does Porgy and Bess go into Jazz or Opera? Does the soundtrack for Rent go under Musicals or Rock? What the heck is Phantom of the Opera anyway? If your music library is organized around a hierarchy of folders, this can be a real headache, and either results either in sacrificing information by arbitrarily choosing one legitimate classification over another, or in creating artistically meaningless hybrid categories.

This problem can be avoided by relaxing the requirement that each item must belong to exactly one category. For music libraries, this is usually accomplished by representing categories as attribute tags, and allowing items to have more than one tag. Thus, Porgy and Bess can carry the tags Opera, Jazz and Soundtrack. This is more informative and less brittle.

NMF accomplishes this by decomposing large matrices into smaller matrices with non-negative components. These decompositions often do a better job at clustering data than eigenvector based methods for the same reason that tags often work better for organizing music than folders. In ecology, the metabolic profile of a site could be represented as a linear combination of site profiles, and the site profile of a taxonomic group could be represented as a linear combination of taxonomic profiles. When we’ve tried this, we have found that although many sites, taxa and Pfams have profiles close to these “canonical” profiles, many are obviously intermediate combinations. That is to say, they have characteristics that belong to more than one classification, just as Porgy and Bess can be placed in both Jazz and Opera categories with high confidence. Because the loading coefficients within NMF are non-negative (and often sparse), they are easy to interpret as representing the relative contributions of profiles.

What makes NMF really different from other dimensional reduction methods is that these category “tags” are positive assignments only. Eigenvector methods tend to give both positive and negative assignments to categories. This would be like annotating Anarchy in the U.K. by the Sex Pistols with the “Classical” tag and a score of negative one, because Anarchy in the U.K. does not sound very much like Frédéric Chopin’s Tristesse or Franz Liszt’s Piano Sonata in B minor. While this could be a perfectly reasonable classification, it is conceptually very difficult to wrap one’s mind around concepts like non-Punk, anti-Jazz or un-Hip-Hop. From an epistemological point of view, it is preferable to define things by what they are, rather than by what they are not.

To give you an idea of what this looks like when applied to ecological data, it is illustrative to see how the Pfams we found in the Global Ocean Survey cluster with one another using NMF, PCA and direct similarity:

While PCA seems to over-constrain the problem and direct similarity seems to under-constrain the problem, NMF clustering results in five or six clearly identifiable clusters. We also found that within each of these clusters one type of annotated function tended to dominate, allowing us to infer broad categories for each cluster: Signalling, Photosystem, Phage, and two clusters of proteins with distinct but unknown functions. Finally – in practice, PCA is often combined with k-means clustering as a means to classify each site and function into a single category. Likewise, NMF can be used with downstream filters to interpret the projection in a “hard” or “exclusive” manner. We wanted to avoid these types of approaches.

Indeed, some of us had already had some success using NMF to find a lower-dimensional representation of these high-dimensional matrices. In 2011, Xingpeng Jiang, Joshua Weitz and Jonathan Dushoff published a paper in JMB describing a NMF-based framework for analyzing metagenomic read matrices. In particular, they introduced a method for choosing the factorization degree in the presence of overlap, and applied spectral-reordering techniques to NMF-based similarity matrices to aid in visualization. They also showed a way to robustly identify the appropriate factorization degree that can disentangle overlapping contributions in metagenomics data sets.

While we note the advantages of NMF, we should note it comes with caveats. For example, the projection is non-unique and the dimensionality of the projection must be chosen carefully. To find out how we addressed these issues, read on!

Using NMF as a tool to project and understand metagenomic functional profile data

We analyzed the relative abundance of microbial functions as observed in metagenomic data taken from the Global Ocean Survey dataset. The choice of GOS was motivated by our interest in ocean ecosystems and by the relative richness of metadata and information on the GOS sites that could be leveraged in the course of our analysis. In order to analyze microbial function, we restricted ourself to the analysis of reads that could be mapped to Pfams. Hence, the matrices have columns which denote sampling sites, and rows which denote distinct Pfams. The values in the cell denotes the relative number of Pfams matches at that site, where we normalize so that the sum of values in a column equals 1. In total, we ended up mapping more than six million reads into a 8214 x 45 matrix.

We then utilized NMF tools for analyzing metagenomic profile matrices, and developed new methods (such as a novel approach to determining the optimal rank), in order to decompose our very large 8214 x 45 profile matrix into a set of 5 components. This projection is the key to our analysis, in that it highlights the most of the meaningful variation and provides a means to quantify that variation. We spent a lot of time talking among ourselves, and then later with our editors and reviewers, about the best way to explain how this method works. Here is our best effort from the Results section that explains what these components represent :

Each component is associated with a “functional profile” describing the average relative abundance of each Pfam in the component, and with a “site profile”, describing how strongly the component is represented at each site. 

A component has both a column vector representing how much each Pfam contributes to the component and a row vector representing the importance of that component at different sites. Each Pfam may be associated with more than one component. Likewise, each component can have a different strength at each site. Remember, the music library analogy? This is how NMF achieves the effect of category “tags” which can label overlapping sets of items, rather than “folders” which must contain mutually exclusive sets.

Such a projection does not exclusively cluster sites and functions together. We discovered five functional types, but we are not claiming that any of these five functional types are exclusive to any particular set of sites. This is a key distinction from concepts like enterotypes.

What we did find is that of these five components, three of them had an enrichment for Pfams whose ontology was often identified with signalling, phage, and photosystem function, respectively. Moreover, these components tended to be found in different locations, but not exclusively so. Hence, our results suggest that sampling locations had a suite of functions that often co-occurred there together.

We also found that many Pfams with unknown functions (DUFs, in Pfam parlance) clustered strongly with well-annotated Pfams. These are tantalizing clues that could perhaps lead to discovery of the function of large numbers currently unknown proteins. Furthermore, it occurred to us that a larger data set with richer metadata might perhaps indicate the function of proteins belonging to clusters dominated by DUFs. Unfortunately, we did not have time to fully pursue this line of investigation, and so, with a wistful sigh, we kept in the the basic idea, with more opportunities to consider this in the future. We also did a number of other analyses, including analyzing the variation in function with respect to potential drivers, such as geographic distance and environmental “distance”. This is all described in the PLoS ONE paper.

So, without re-keying the whole paper, we hope this story-behind-the-story gives a broader view of our intentions and background in developing this project. The reality is that we still don’t know the mechanisms by which components might emerge, and we would still like to know where this this component-view for ecosystem function will lead. Nevertheless, we hope that alternatives to exclusive clustering will be useful in future efforts to understand complex microbial communities.


Full Citation: Jiang X, Langille MGI, Neches RY, Elliot M, Levin SA, et al. (2012) Functional Biogeography of Ocean Microbes Revealed through Non-Negative Matrix Factorization. PLoS ONE 7(9): e43866. doi:10.1371/journal.pone.0043866.


A soothing microbiome music video of sorts

For all you microbiome geeks out there here is a music video of sorts from Antonio Gonzalez Peña in Rob Knight’s lab.

Seminar: Valerian Dolja “Evolution of the Virus World” #UCDavis 11/5

[seminars]

Plant Pathology 290 Graduate Seminar Series

Plant Pathology

UC Davis

“Evolution of the Virus World”

Dr. Valerian Dolja

Professor, Department of Botany and Plant Pathology,

Oregon State University

Monday, November 5, 2012

9:00-9:50 a.m.

115 Hutchison

V. Dolja Flier.doc

#UCDavis Genome Center Symposium 10/31 GBSF 1005

The UC Davis Genome Center Symposium will be held Wednesday Oct 31, from 8:30AM to 4PM in the GBSF Auditorium. The focus this year is on genomics of human diseases and on metagenomics. The key note speaker, Dr. Janet Jansson from UC Berkeley, will give a talk at 1PM entitled: “Omics exploration of the human gut microbiome”. In addition, the symposium features interactive forums, individuals talks, food (breakfast, lunch, coffee breaks), and an exhibition of carved pumpkins. Everybody is welcome, costumed or not.
fall symposium 2012 announcement.pdf

Lab meeting. Tuesday, Oct. 30th 2012

Ladan Doroud will be presenting for this week’s lab meeting. We will be meeting at the Genome Center from 1:30 to 3:30pm in room 4202

And now the Human Microbiome has it’s own National Academy Report

http://www.nap.edu/napbookwrapper.swf
Very interesting: There is a new workshop summary out from the Institute of Medicine of the National Academy of Sciences: The Human Microbiome, Diet, and Health – Workshop Summary – Institute of Medicine From the summary

“One of the most intimate relationships that our body has with the outside world is through our gut. Our gastrointestinal tracts harbor a vast and still largely unexplored microbial world known as the human microbiome that scientists are only just beginning to understand. Researchers are recognizing the integral role of the microbiome in human physiology, health, and disease — with microbes playing critical roles in many host metabolic pathways — and the intimate nature of the relationships between the microbiome and both host physiology and host diet. While there is still a great deal to learn, the newfound knowledge already is being used to develop dietary interventions aimed at preventing and modifying disease risk by leveraging the microbiome. 

The IOM’s Food Forum held a public workshop on February 22-23, 2012, to explore current and emerging knowledge on the human microbiome, its role in human health, its interaction with the diet, and the translation of new research findings into tools and products that improve the healthfulness of the food supply. This document summarizes the workshop.”

I was unable to go but am very interested in the topic.  Forrunately one can get the report for free.  And I will be reading it ASAP.