The story behind the paper by @JeremyJBarr on phage using mucus to hunt prey

This is a guest post by Jeremy Barr about a new paper of his. Also see his previous post from 2013: Story behind the paper: from Jeremy Barr on “Bacteriophage and mucus. Two unlikely entities, or an exceptional symbiosis? “

The story behind the paper “Subdiffusive motion of bacteriophage in mucosal surfaces increases the frequency of bacterial encounters
Here’s the story behind our recent publication on the subdiffusive motion of bacteriophage in mucus published in PNAS – a manuscript that builds on our Bacteriophage Adherence to Mucus (BAM) model of phage-derived immunity. You can also find a recent write up on the work by San Diego State University (SDSU) News Center here.
In early 2013, I attended my first Keystone Symposia conference on “Emerging Topics in Immune System Plasticity” in Santa Fe, New Mexico. Apart from the excellent snow conditions, I was beginning to question my decision to attend an immunity conference as an experimental microbiologist, but one of the last presentations at the conference, given by Christopher Hunter from UPenn, stuck with me. The Hunter lab was investigating the ability of CD8+ T cells to control the parasite Toxoplasma gondii in the brains of mice. Using a powerful microscopy system, they were able to watch T cell movement in real time while they were searching the brain for the sparsely distributed parasites. They found the T cells moved in a specific pattern, characterized by many short-distance movements interspersed with occasional longer-distance flight to a new area. This search strategy is known as a Lévy flight, and it allowed the T cells to more effectively search an area of the brain for hiding Toxoplasma than if they searched by directed or random motion (see paper here). Once I saw this talk, the idea behind our paper was planted. I knew that by adhering to mucus, bacteriophage could also use this strategy to hunt bacteria, but it wasn’t until a couple of years later that I was able to test this hypothesis.
The makings of a microfluidic mucus layer.
During this time, I had been reading a number of papers that were reconstituting organ-level functions on microfluidic devices, making simulated lung or gut environments.

Recognizing the potential of these systems, I began working with Samuel Kassegne and his Masters student Nicholas Sam-Soon in the Department of Mechanical Engineering at San Diego State University (SDSU) to develop our own microfluidic ‘chip’ aimed to simulate a mucus layer with fluid flow and secretion dynamics. I had no idea how difficult this endeavor would be. Our first chip was as close to a complete failure as one could get. The device leaked, it was dirty, and I had the bright idea that we could simple poke a syringe into the chip to set up fluid flow.

But we persevered. We continually solved problem after problem, with every solution leading to new problems, be it leaks, growths, or cracks in the chip. Two years and a Masters thesis later, the system was finally working at a useful throughput for us to experimentally test. We could now run up to nine chips simultaneously and immediately set out to recapitulate our prior results – that mucus-adherent phage protected mucosal epithelium from bacterial infections.
What we found from these experiments was quite surprising. Firstly, I should explain that the model system we were using was phage T4, a strictly lytic phage that infects and kills Escherichia coli that we previously showed was capable of adhering to mucus, and a T4∆hoc phage that is equally capable of killing E. coli but lacks the capsid proteins required to adhere to mucus. When we infected the chips with E. coli bacterium and the non-mucus adherent T4∆hoc phage, we found that these phage-treated chips were no better at reducing bacterial abundance in the mucus layer compared to control chips where no phage had been added at all. Meanwhile, the mucus-adherent T4 phage was capable of reducing bacterial colonization in the mucus by over 4000-fold. We next investigated whether differences in phage accumulation or persistence in the mucus could explain this stark difference, but we found no effect. The question remained, why were the mucus-adherent phage better suited at finding and reducing bacteria in mucus than the same phage that could not stick?

Weekly math meetings to the rescue
For the last four and a half years I have been extremely fortunate to have the opportunity to work as both a post-doc and now an adjunct faculty in Forest Rohwer’s lab at SDSU. During that time, one of Forest’s many punishments for me was compulsory, weekly Bio-Math meetings, which are still being run here at SDSU. These meetings were something that I initially rebelled against – what good could math do me? But as I unwillingly persisted, I came to realize the value in using math to describe biological systems. This is especially true for phages that play the game of life at a speed and scale that is at times incomprehensible.
Over time, I came to have my own weekly math meetings with a group of SDSU mathematicians, statisticians, and physicists. I owe a big thanks to Peter Salamon, Arlette Baljon, Jim Nulton, and Ben Felts, who all took countless hours out of their days to meet with me and discuss the complexities of diffusion. During these meetings we analyzed hundreds of thousands of data points detailing phage diffusivity in mucus, and eventually we answered the question as to why mucus-adherent phage were better at reducing bacterial numbers – the phage were employing a search strategy to hunt bacteria in mucus. But this search strategy was not the same as the Lévy flights I had seen the T cells use at the conference talk years earlier. This was something different, something that no predator had been shown to utilize before. Our phage were using a type of motion know as subdiffusion.
Phage are like ticks in a grass field
We found that phage that adhere weakly to mucus, through reversible binding interactions to one or more mucin strand, exhibit subdiffusive motion, not normal diffusion, in mucosal surfaces. The question now was what that means for the phages. What benefit could subdiffusive motion provide?
Subdiffusion is a very abstract concept that is difficult to explain without mathematical formula, and we spent many hours discussing the possible biological implications. Subdiffusive particles move slower and slower over time, remaining in their original positions longer, and in certain models the chance of finding a nearby target is significantly increased. Using similar logic, we hypothesized that mucus-adherent phage moved slower in specific regions of the mucus layer, remained nearby sites of productive bacterial infections, and concentrated in regions of the mucus that overlapped the niche of their bacterial host – all resulting in a greater chance for the phage to encounter a bacterium. Now we just had to prove it.

One of the beautiful things about phage biology is the detailed and expansive literature published over the last 100 years. Going back through these papers, we found a classical phage experiment that was first published in 1932 by Martin Schlesinger. This experiment measured the adsorption rate of a specific phage to its bacterial host. Using this assay, we showed that phage adsorption rate was increased in mucin solutions at low, but not high, bacterial concentrations. The logic here is that when bacterial hosts are abundant, the chance of a random phage-host encounter is high, and any improvement in the search strategy employed doesn’t provide a noticeable benefit. But when bacterial abundance is low and chance phage-host encounters are comparatively low, performing a more efficient search can greatly improve the chances of a successful infection.
The implications here become apparent when we consider that phages are typically quite specific and that mucosal surfaces harbor a large diversity of bacterial hosts – dynamics that reduce the chance of any successful phage-host encounter. From the perspective of the phage subdiffusing within a mucus layer, the world is a three-dimensional web, and like ticks in a grass field, the phage are holding onto the mucus network, awaiting a bacterial host.
The publication process
I presented this work at another Keystone Symposia on “Gut Microbiota Modulation of Host Physiology” earlier this year. During one of the conference dinners, an editor for Science happened to join the table where I was seated. We started speaking and they suggested that I submit the work for review at Science. At the time, I was reading Steven Pinker’s The Sense of Style and wanted to write the paper in ‘Classic Style’ to simply explain phage subdiffusion and appeal to a broader audience. I was very fortunate to be able to work once again with Merry Youle. We wrote a very stylized paper for Science, but after a two-week internal review we were told that although the work would likely be of great interest to the field, it was not broad enough for their general readershipSo we quickly edited the paper and sent it to PNAS for review.
Our reviewers from PNAS were very helpful and suggested a number of experiments that strengthened the work, but they all hated the writing style and asked us to cut out many of the phage anthropomorphisms we had used (e.g., phage hunting bacteria). We spent another three months collecting and analyzing additional data and rewriting the paper, now with a more serious tone (e.g., search strategies instead of hunting). Overall, I felt our resubmitted paper was much stronger scientifically, even though it lost some readability. But the paper was still not accepted, and we had to go through a third revision. The final reviewer insisted on us including in vivo experiments (not something we could easily do for this paper, but we’re working on it) and continued to argue that the use of ‘search strategy’ obfuscated phage subdiffusion in mucus. Although we disagreed with this final point, the thought of going through another review was enough for us to concede, and we removed the use of this term from the paper. The rest of the editorial process was handled extremely well and we were in press at PNAS just three weeks later.

A Phoenix Rises from the Ashes: A new discovery emerges from the 2009 retraction.

This is a post in my continuing series of the “Story Behind the Paper.” series. This post is from Benjamin Schwessinger, Pamela Ronald, Rory Pruitt, Anna Joe, and Ofir Bahar.

A Phoenix Rises from the Ashes: A new discovery emerges from the 2009 retraction.

A phoenix depicted in a book of legendary creatures by FJ Bertuch (1747–1822).
Via Wikipedia Commons – based on this

This is the story behind our report published today in Science Advances.

The Background

In Science Advances we report that one class of bacteria produces a previously undescribed, and long sought after, molecule recognized by plants carrying a specific receptor.

The story began in the 1970s, when Professor Gurdev Khush and colleagues demonstrated that a wild species of rice was immune to most strains of the Gram-negative bacterium Xanthomonas oryzae pv. oryzae (Xoo), causal agent of a serious disease of rice globally. In the 1990s Ronald began studying the rice/Xoo interaction. Because both rice and Xoo are genetically tractable, the rice/Xoo biological system proved to be an excellent system for studies of the molecular mechanisms governing the plant immune response. In 1995, two postdoctoral fellows in Ronald’s lab at the University of California, Davis- Guoliang Wang and Wenyuan Song-reported that this rice immune response was controlled by a single receptor kinase, called XA21.

The predicted structure of the XA21 protein, with a predicted leucine rich repeat extracellular domain and an intracellular kinase domain, suggested that XA21 could sense a secreted microbial molecule and then activate an immune response.

A few years after the discovery of the XA21 receptor, the fly Toll and mouse Toll-like receptors (Tlr4) were shown to share striking structural similarities with XA21 and other plant receptors. The animal receptors also recognized and responded to microbial molecules. Together these discoveries demonstrated that plants and animal use similar mechanisms to protect against infection. Professors Bruce Beutler and Jules Hoffman were awarded the 2011 Nobel Prize in Physiology or Medicine for their important work.

The Ronald laboratory then spent twenty years trying to identify the microbial molecule that is recognized by XA21. The research led to the identification of a number of microbial genes that are required for activation of XA21-mediated immunity (rax genes). These genes encode a tyrosine sulfotransferase, RaxST, and three components of a predicted type 1 secretion system: a membrane fusion protein, RaxA; an ATP-binding cassette transporter, RaxB; and an outer membrane protein, RaxC. raxST, raxA, and raxB are located in a single operon (raxSTAB). Based on these findings, we hypothesized that the activator of XA21-mediated immunity is a tyrosine sulfated, type 1-secreted protein.

We were excited about this idea because sulfation has emerged as an important posttranslational modification controlling receptor-ligand interactions. It is a common posttranslational modification of eukaryotic proteins and plays important roles in regulating development and immune responses. The importance of this area of research to biology and medicine is reflected in the recent report of a novel drug that blocks HIV infection. To achieve this breakthrough, the researchers exploited the observation that HIV binds tyrosine sulfated amino acids for cell entry (Gardner et al., 2015).

Despite a clear model and diverse supporting data suggesting that Xoo secretes a sulfated peptide, the identity of this molecule remained elusive.

In 2009, the Ronald laboratory reported that XA21 recognized a sulfated peptide. However we later discovered major errors in this work and in 2013, we retracted the paper. We discussed these mistakes in several lectures, post and articles including a Keystone symposium, Scientific American, Nature, and Schwessinger’s blog (here and here). The process with which we addressed the problems was highlighted as “Doing the right thing” by Retraction Watch, a blog that reports on retractions of scientific papers. The retraction was included as one of the top 10 retractions of 2013.

The new Discovery

Today, in Science Advances, we are delighted to report the identification of the microbial molecule that activates XA21-mediated immunity. As predicted, it is a tyrosine-sulfated protein. We named this microbial protein RaxX.

The rice immune receptor recognizes the bacterial molecule RaxX and initiates an appropriate immune response. Illustration by Kelsey Wood.

To isolate this molecule, postdoctoral fellow Rory Pruitt systematically created bacterial mutants carrying deletions near the RaxSTAB operon. He showed that one of the deletion mutants lost the ability to activate the XA21-mediated immune response. The deleted region encodes a small open reading frame that we named RaxX. Xoo strains lacking RaxX and Xoo strains that carry mutations in the single RaxX tyrosine residue (Y41) are able to evade XA21-mediated immunity. Postdoctoral fellow Anna Joe, together with collaborators at the University of Texas, Austin and at the Joint Bioenergy Institute in Emeryville, showed that Y41 of RaxX is sulfated by the prokaryotic tyrosine sulfotransferase RaxST. Postdoctoral fellow Benjamin Schwessinger, graduate student Nick Thomas and collaborators showed that sulfated, but not nonsulfated, RaxX triggers hallmarks of the plant immune response in an XA21-dependent manner. A sulfated, 21–amino acid synthetic RaxX peptide (RaxX21-sY) is sufficient for this activity. Xoo field isolates that overcome XA21-mediated immunity encode an alternate raxX allele, demonstrating the co-evolution of host and pathogen. RaxX is highly conserved in many Xanthomonas species.

Our results indicate that the presence or absence of sulfation is decisive for the ability of RaxX to trigger XA21-mediated immunity.

The new insights gained from the discovery and characterization of RaxX may be useful for the engineering of resistant crop varieties and for the development of therapeutic reagents that can block microbial infection of both plants and animals.

The rice XA21 receptor kinase, the first innate immune receptor discovered in plants or animals, provides resistance against Xanthomonas oryzae pv oryzae through recognition of RaxX, a tyrosine-sulfated protein secreted by the bacterium.

Illustration by Maurice Vink

Notes on the publication process

The scientific life is the most complex of all to write about. In the case of scientists, impulse becomes compulsion”. — Carol Shields

After we discovered mistakes in our previous paper, we spent several years correcting the scientific literature both by retracting the original Science paper (Lee et al. 2009) and by following up with publications to further correct the literature (Bahar et al. 2014). We made extra efforts to control the results in this current report.

Wrestling with the retraction and discovering the new molecule in rapid succession was an enormous challenge. Here we share some of the lessons learned.

Pamela Ronald, Professor, Department Plant Pathology and the Genome Center, UC Davis; Director of Grass Genetics, the Joint Bioenergy Institute:

I would not wish a retraction on anyone. Scientists are supposed to catch their mistakes before publication. Still, I am astonished to conclude that the process has in some ways been positive.

On an administrative level, the lab is running more efficiently. I have instituted new practices for the lab: created duplicate stocks of key strains (validated and maintained by the lab manager), mandated electronic notebooks for each lab member and required that all new assays be independently validated by three independent researchers before publication.

But the best part of this bad situation has been working with this particular team. It has been an immense privilege to watch each person work through the situation in their own way, collaborate, and make new discoveries. Respect for each other and for the scientific process was paramount. After figuring out what went wrong (no easy task), they tried not to look back. They did not give up, even when it would have made sense to do so. Their persistence and optimism in face of this daunting challenge buoyed all of our spirits. I will always be in awe of their work and will always be grateful.

Equally stunning was the supportive and kind response from the scientific community. We received many letters of encouragement – even from complete strangers. It helped us keep going.

There are still hills to climb. Some scientists may be extra skeptical of results from my lab for a long time to come. For example, in a critique of our submission, one of reviewer’s asked, “how do we know the strains weren’t mixed up again this time?”

Rory Pruitt, postdoctoral scholar in the Ronald lab.

I was only a few months into my postdoc when I became convinced that the majority of the Ax21 story was incorrect (Ax21 was the proposed elicitor of XA21-mediated immunity in the retracted papers). My mind was filled with questions. How could this happen? What results can I believe? Admittedly, the biggest question that hounded me was “Should I be looking for a new job?” There were a few key factors that led to my decision to stay in the lab. I think these factors were also critical to this story working out as a “success.”

Early on, I went to Pam with some of my doubts. It was terrifying to approach my new boss and I say I didn’t believe some of her published work (including a Science paper!). But I needed to know that I could be honest with her and not feel pressured into only showing results that fit the established model. Pam listened to my concerns and those of others in the lab. Most importantly, she showed that she was committed to getting the story right and correcting the literature if need be.

In addition to Pam, there was a great team of postdocs and graduate students who were equally devoted to correcting the science. At times it seemed a long, painful process with little reward (there’s not a good space on a CV for working towards a retraction). Nevertheless, it needed to be done so that we and other labs could move forward. I was encouraged by Ofir, Ben, and others who worked persistently on this.

A final factor in my decision to stay is the prospect of new discovery. If Ax21 isn’t the activator of XA21-mediated immunity, what is? Maybe we can find it! It’s that hope of new discovery that keeps us coming back to the lab bench. My postdoctoral experience has had some highs and lows, but I am glad I stuck it out. With persistence, enthusiasm, and a good team committed to reliable science, we were able to not only correct earlier mistakes but also move forward.

Benjamin Schwessinger, former Ronald Laboratory postdoctoral scholar and now independent research fellow in Australia, at the Australian National University in Canberra.
You have much to lose as an early career researcher if you are thrust into a situation where results cannot be reproduced. In a hyper competitive environment irreproducible results you are trying to build on are a big problem, no matter how smart, privileged, and gifted you are. Lengthy delays in publishing as a postdoc can cause great harm to a career. Here are the main factors that made us successful in the face of adversity.

(Be lucky) have your own funding

Your own funding makes you financially and also scientifically more independent. It ensures your academic freedom. I was grateful to have been supported independently by the Human Frontier Science Program. It made me bolder and braver in speaking out. I was able to choose to stay or go. Because of the team I believed in I decided to stay!

Get confidential outside advice

Getting some outside confidential impartial advice on how to approach this problem is very important. Many senior figures have most likely seen similar cases in the past and have more insight. Following through with this advice is a total different matter. I decided to stay!


Work through it together as a team. Build on each other’s strength and talk about all possibilities. Repeat each other’s experiments with all required controls. Invite well respected figures in the field to independently test (and confirm) core experiments.

Admit mistakes and retract
Everyone makes mistakes. They are part of the scientific discovery and science has to be self-correcting. Retractions are an integral part of this process. Not to retract is NOT an option! It obstructs all future progress in the subject matter.

Follow the data

Do controls, repeats, and repetitions of conclusive experiments. Seeing is better than believing.

Ofir Bahar, former Ronald Laboratory postdoctoral scholar and now principal investigator, Plant-Microbe Interaction Research Group, the Volcani Center, Israel,

I remember the day, early 2013, when we were driving back to Davis from a happy and relaxed baby shower at Benjamin’s place in Oakland, Rory mentioned to me “you know, I deleted an upstream and a downstream region to raxSTAB. The downstream mutant was no different than wild type, but the upstream mutant forms long lesions on XA21 plants…”

This was the turning point; I immediately knew this was a big discovery and a major break through for the lab.

But before that moment, we were a bunch of enthusiastic post docs that just loved doing science. We wrote these nice proposals to get our fellowships, based on the amazing story of the rice immune receptor XA21 and its (thought to be) elicitor Ax21.

It was a fascinating story we were all so excited about having read it in Science. Of course we joined the Ronald lab to follow up on this initial discovery, but well… the building upon part did not work as we all might have wished. We had to dig deep, real deep, to figure out what was going on and what went wrong before our arrival to the lab. So, a year….. year-and-a-half in our new positions we finally reached the ultimate conclusion that there was a big hole in the model – there’s no elicitor! Or, there is, but it’s not Ax21 and we don’t have a clue what the identity of this molecule might be. It felt like we were thrown back 10 years, to 2004 with the da Silva paper just published describing the requirement of the three Xanthomonas genes RaxST, RaxA and RaxB for XA21 immune activation.

Those were ‘dark ages’ and difficult times. Understanding that most of the time you invested so far was, at least in practical terms (e.g. publications), for nothing, and that there is no biological model to work on, but that it needs total reboot. To be honest I was feeling a bit worried at that time for my scientific career. But then, a series of exciting discoveries (including some that are not published yet) gave me hope again. Well… isn’t this how science goes, bad, bad, bad, bad, good, bad, bad, bad, good and so on. I remember Pam telling me: “you know why I love a big group? There has got to be some positive results coming all the time”

Later, a few months after Rory shared with me his finding, we already knew what it was, and we were very certain, this is the ONE. Unfortunately, or luckily, I got a position offered at my home country and I gladly accepted it. So I actually wasn’t there for the flower stage (you know… the decorations), but I was very happy to have been there when the bud of this beautiful flower to be emerged. Every time I think of this story its like, WOW, can you believe all this has happened in just 3-4 years, unbelievable.

My lesson is, never lose hope, be critical, believe it when you see it, work on multiple projects, enjoy science and openly share science

Anna Joe, postdoctoral scholar in the Ronald lab.

I was in my final year grad school and looking for a postdoc position in early 2013. The Ronald lab was on the top of my wish list because I was fascinated by the Ax21 story in Science 2009. But just before I applied for a position in the Ronald lab I learned that something went wrong with Ax21 and that the original paper would be retracted. Many thoughts crossed my mind. Main one was “Do I still want to join the Ronald lab?”. Actually it was easy to answer the question once I spoke with Pam about it and talked with her lab members during the visit for my formal interview. “Yes, I’d like to work in the lab which just retracted two papers”. This for sure sounds crazy to most people. However, the whole experience of my visit gave my many reasons to join the Ronald lab. Correction of errors is a part of science (I knew this because I also had difficult time to track down a mix up plants problem before) but not many people are brave enough to admit mistakes. Pam and all lab members honestly, clearly stated to me what the errors were and how they verified the problems. They communicated well with each other, shared idea freely and respected other’s opinions. Their open mind and transparency attracted me.

On top of that I was very curious about the unexplored, new Xa21 activator. All other lab members might have felt the same curiosity and channeled its energy to continuously work through the problems during last several years. Although I did not share the “dark period”, I could see everybody in the lab was persistent with the common effort to correct the science. I experienced incredibly good teamwork and great collaboration. All of those are the driving force of our success. Finally, I’d like to mention that we could not make it without the support and encouragement from the scientific community. Many scientists shared their thoughts and advice and were rooting for us. Most collaborators unhesitatingly complied with our requests for assistance. They helped us not only “do the right thing”, but also do better science.

Story behind the paper: Backbones of evolutionary history test biodiversity theory for microbes

This is a guest post in my series “The Story Behind the Paper“.  Post is by James O. Dwyer about his paper (coauthored with Steven Kembel and Tom Sharpton) in PNAS entitled “Backbones of evolutionary history test biodiversity theory for microbes

Backbones of evolutionary history test biodiversity theory for microbes
This paper has its roots going back a few years, and it all started off fairly innocuously.  A previous collaboration with Steve Kembel and Jessica Green resulted in this earlier paper, where we had the lofty goal of encouraging microbial ecologists to throw out slightly less data, and also attracted Jonathan’s attention for our microbiome figures.  One of the central questions in ecology is to explain and understand patterns of biodiversity: for example, by quantifying the diversity of a local community (“alpha” diversity), or similarity between multiple local communities (“beta” diversity).  In microbial ecology it is common to use evolutionary history to quantify these measures. But both phylogenetic alpha and beta-diversity tend to change systematically with increasing sample size, making it difficult to compare results for samples of different sizes.
Our idea in the earlier paper was to generate a fast way to compute a null prediction for these metrics for phylogenetic alpha and beta diversity—i.e. this would provide a way to standardize the results for sample size, and hence we could use full samples rather than smaller, rarefied samples.  The solution is relatively simple, and involved a phylogenetic analogue of the Species Abundance Distribution (SAD), which we called the Edge-length Abundance Distribution (EAD).  In comparison with the SAD, this distribution replaces species units with subclades of a phylogenetic tree, replaces species abundances with subclade size, and inserts branch length weightings in a specific way.
The present day
Job done.  So how did this lead to a new paper?  Well, this first study generated something slightly mysterious to us.  In theory, the EADs we computed from empirical data could have taken any form they wanted to—and yet for various microbiome habitats, they all seemed to display a very distinct power law scaling. Translated into a more concrete consequence, the form of the EAD was such that phylogenetic diversity typically increased as a power law function of sample size.  There’s a history in ecology of looking for (and sometimes finding) behavior that both takes on a power law scaling, and is also universal across multiple systems, fitting with a general sense that some patterns may be emergent and independent of much of the underlying variation between communities. There’s also a history of looking for (and sometimes finding) power law scaling in evolutionary trees, for example in the number of species per genus, which has often been claimed as a power law.  Here we had found a link with these older ideas, with a nice combination of new factors.  First, we weren’t relying on human definitions of species, which could certainly be biased towards generating power law scaling artificially (e.g., the principle of balance).  Second, we had large numbers, so that these scaling behaviors spread over multiple orders of magnitude.  Third, there was an untapped world of microbial sequence data to look at to see whether these patterns extended into microbiology.
With Tom and Steve, we combined these ideas to set up the empirical side of this new paper: expand the original study across a broader range of habitats, test whether the patterns are robust to different alignment and inference methods, and see whether the same scaling behavior holds up for this new range of samples.  Which indeed it did—Figures 1 and 3 in the new paper show that this power law scaling is present across multiple microbial habitats.  

 Just knowing that this distribution takes a power law form is already useful on its own, because (again) it defines the null expectations for the way phylogenetic alpha and beta diversity change with sample size.  But these results still left a number of open questions, centering around whether this could also give us some insight into what models of biodiversity could be consistent with what we were seeing.  Could these scaling patterns provide evidence for whether a given ecological and evolutionary scenarios had strongly influenced a community?

Coarse-graining: reducing the resolution of phylogenetic trees
The first modeling approach we considered is neutral theory. Neutral models have provided the basic null models in fields stretching from population genetics and ecology to cultural evolution and the social sciences. In common is the key assumption that selective differences are irrelevant for predicting large-scale patterns. If the power law scaling is just an inevitably–an ecological version of Benford’s law–it seemed likely that it might be just a consequence of neutrality, with all of the variation and mechanism somehow washing out.  Is it possible that these observed phylogenetic patterns are driven by this most basic, neutral model of biodiversity?  The answer turns out to be no—at least using the vanilla version of the neutral theory, we don’t reproduce these scaling behaviors.
Next, we got a little creative. When working with trees generated by neutral processes, we were thinking of the Kingman coalescent. I.e. a model of tree structure that works backwards in time, coalescing pairs of lineages at each node.  There’s a one-parameter family of coalescent models generalizing the Kingman coalescent, with the unifying feature that more than two lineages can coalesce at each node.  Viewed forward in time, one lineage can burst into many. This generalized family, the Lambda-coalescent, produces precisely the power law EAD (known in that context as a site-frequency spectrum) we were looking for.
These generalized coalescent trees have previously been used to understand population processes with a skewed offspring distribution, where there is a significant probability that an organism has a large number of offspring, and this matches the idea of multiple lineages coalescing. But for our evolutionary trees that idea of instantaneous, multiple branching seemed unlikely. At a fine-grained level, branches in our evolutionary trees ought to split into two, driven by cell division and subsequent diversification. This is also what our tree inference algorithms are designed to find, even when our sequence data likely isn’t sufficient to resolve all polytomies.  So how could these generalized coalescent trees possibly be consistent with our empirical trees?  

Instead of trying to resolve as many polytomies as possible, we decided to go in the other direction. We imagined reducing the resolution at which we could distinguish the order of branching events. Applying this `coarse-graining’, we would certainly generate polytomies, as fast bursts of branching and multiple nodes collapse. Still, much like the EAD, there was no guarantee for what the distribution of polytomy sizes would be after this coarse-graining, or whether it would match these theoretical models. So our second surprise is that the distribution of burst sizes is also a power law—qualitatively consistent with the same distribution in the Lambda-coalescent.

So this seems to be the beginning of a very nice story, with a lot of open questions.  Empirical trees display bursts of branching, which quickly collapse to polytomies under coarse-graining, and the distribution of sizes of these bursts is a power law.  The Lambda-coalescent is likely not the end of the story, but at least suggests that this distribution is tied together with the scaling behavior of phylogenetic diversity.  
What’s next?  Certainly lots of empirical questions.  Does this behavior extend over an even broader range of samples?  And will it still hold if we have better, longer sequence data?  There are also theoretical questions, mostly centering around whether we can relate parsimonious but mechanistic models to the bursty tree structures, and how best to evaluate and compare these models.  One take-home message stands out for me.  Simplified models of biodiversity, like neutral models and their generalizations, likely won’t ever capture the fine-grained dynamical behavior of an ecological community. But they might just tell us something about coarse-grained dynamical behavior, and coarse-grained phylogenies could be a nice part of this story.  Let’s see if coarse-grained patterns can be matched with coarse-grained process.

Blind trust in unblinded observation in Ecology, Evolution and Behavior (Guest Post by Melissa Kardish)

This is a guest post from Melissa Kardish – a PhD student at UC Davis – writing about a recent paper from work she did at her prior position.  The citation for the paper she is writing about is below:
Kardish MR, Mueller UG, Amador-Vargas S, Dietrich EI, Ma R, Barrett B and Fang C-C (2015) Blind trust in unblinded observation in Ecology, Evolution, and Behavior. Front. Ecol. Evol. 3:51. doi: 10.3389/fevo.2015.00051
Here is her post.

Blind trust in unblinded observation in Ecology, Evolution and Behavior

We recently published our study in Frontiers in Ecology and Evolution where we found that a remarkable number of studies that could be affected by observer bias didn’t indicate whether or not they blinded their research. In fact only 13.3% of studies reported this:

We tried to make this a very transparent study. In addition to journal level data in the main article, we include in our supplemental material a table with the score for every article we read for this study (a summary of these scores per journal can be found in Figure S2 included here). If anything, our results under-represent the amount of studies that could have been scored blind (the real underreporting/underuse of blind observation is probably less than the 13.3% we report). For instance, we did not assess that there was potential for bias in the scoring of microsatellite markers (scored as unlikely to have observer bias). However, we did identify one study which was based on data from microsatellites which did blindly score their markers and report this scoring in their methods (and was therefore scored as “blind” in our study).  We also considered a study blind in its entirety for the purposes of our scoring if only one aspect is reported even if other experiments could also have been influenced by observer bias (Check out our supplemental methods for more ways we conservatively scored in our study).

We recognize that not all EEB studies can be blinded due to a variety of logistical or hypothesis driven reasons; however, we encourage such studies to accurately report this rationale and consider and attempt to minimize observer bias when designing experiments.
Thus far we have had a great response from the surveyed journals. Many of them have notified their editors about the lack of blind observation that we found reported in their journal. One journal has even notified us of plans already in place to address this issue at their next editorial board meeting.
We’re excited to have this work out there and hope this will inspire people to blind their studies and accurately report the science they are doing. We’re also excited to have the study published in an open-access format where we hope the encouragement for blind observation can reach all levels of science. Finally, as reporting of science in our fields improves in the coming years, we hope this study can serve as a template to address other potential concerns in experimental design and reporting.

Story Behind the Paper: Comparative Analysis of Functional Metagenomic Annotation and the Mappability of Short Reads (by Rogan Carr and Elhanan Borenstein)

Here is another post in my “Story Behind the Paper” series where I ask authors of open access papers to tell the story behind their paper.  This one comes from Rogan Carr and Elhanan Borenstein.  Note – this was crossposted at microBEnet.  If anyone out there has an open access paper for which you want to tell the story — let me know.

We’d like to first thank Jon for the opportunity to discuss our work in this forum. We recently published a study investigating direct functional annotation of short metagenomic reads that stemmed from protocol development for our lab. Jon invited us to write a blog post on the subject, and we thought it would be a great venue to discuss some practical applications of our work and to share with the research community the motivation for our study and how it came about.

Our lab, the Borenstein Lab at the University of Washington, is broadly interested in metabolic modeling of the human microbiome (see, for example our Metagenomic Systems Biology approach) and in the development of novel computational methods for analyzing functional metagenomic data (see, for example, Metagenomic Deconvolution). In this capacity, we often perform large-scale analysis of publicly available metagenomic datasets as well as collaborate with experimental labs to analyze new metagenomic datasets, and accordingly we have developed extensive expertise in performing functional, community-level annotation of metagenomic samples. We focused primarily on protocols that derive functional profiles directly from short sequencing reads (e.g., by mapping the short reads to a collection of annotated genes), as such protocols provide gene abundance profiles that are relatively unbiased by species abundance in the sample or by the availability of closely-related reference genomes. Such functional annotation protocols are extremely common in the literature and are essential when approaching metagenomics from a gene-centric point of view, where the goal is to describe the community as a whole.

However, when we began to design our in-house annotation pipeline, we pored over the literature and realized that each research group and each metagenomic study applied a slightly different approach to functional annotation. When we implemented and evaluated these methods in the lab, we also discovered that the functional profiles obtained by the various methods often differ significantly. Discussing these findings with colleagues, some further expressed doubt that that such short sequencing reads even contained enough information to map back unambiguously to the correct function. Perhaps the whole approach was wrong!

We therefore set out to develop a set of ‘best practices’ for our lab for metagenomic sequence annotation and to prove (or disprove) quantitatively that such direct functional annotation of short reads provides a valid functional representation of the sample. We specifically decided to pursue a large-scale study, performed as rigorously as possible, taking into account both the phylogeny of the microbes in the sample and the phylogenetic coverage of the database, as well as several technical aspects of sequencing like base-calling error and read length. We have found this evaluation approach and the results we obtained quite useful for designing our lab protocols, and thought it would be helpful to share them with the wider metagenomics and microbiome research community. The result is our recent paper in PLoS One, Comparative Analysis of Functional Metagenomic Annotation and the Mappability of Short Reads.

The performance of BLAST-based annotation of short reads across the bacterial and archaeal tree of life. The phylogenetic tree was obtained from Ciccarelli et al. Colored rings represent the recall for identifying reads originating from a KO gene using the top gene protocol. The 4 rings correspond to varying levels of database coverage. Specifically, the innermost ring illustrates the recall obtained when the strain from which the reads originated is included in the database, while the other 3 rings, respectively, correspond to cases where only genomes from the same species, genus, or more remote taxonomic relationships are present in the database. Entries where no data were available (for example, when the strain from which the reads originated was the only member of its species) are shaded gray. For one genome in each phylum, denoted by a black dot at the branch tip, every possible 101-bp read was generated for this analysis. For the remaining genomes, every 10th possible read was used. Blue bars represent the fraction of the genome's peptide genes associated with a KO; for reference, the values are shown for E. coli, B. thetaiotaomicron, and S. Pneumoniae. Figure and text adapted from: Carr R, Borenstein E (2014) Comparative Analysis of Functional Metagenomic Annotation and the Mappability of Short Reads. PLoS ONE 9(8): e105776. doi:10.1371/journal.pone.0105776. See the manuscript for full details.
The performance of BLAST-based annotation of short reads across the bacterial and archaeal tree of life using the ‘top gene’ protocol. See the manuscript for full details. Figure and text adapted from: Carr R, Borenstein E (2014) Comparative Analysis of Functional Metagenomic Annotation and the Mappability of Short Reads. PLoS ONE 9(8): e105776 

To perform a rigorous study of functional annotation, we needed a set of reads whose true annotations were known (a “ground truth”). In other words, we had to know the exact locus and the exact genome from which each sequencing read originated and the functional classification associated with this locus. We further wanted to have complete control over technical sources of error. To accomplish this, we chose to implement a simulation scheme, deriving a huge collection of sequence reads from fully sequenced, well annotated, and curated genomes. This schemed allowed us to have complete information about the origin of each read and allowed us to simulate various technical factors we were interested in. Moreover, simulating sequencing reads allowed us to systematically eliminate variations in annotation performance due to technological or biological effects that would typically be convoluted in an experimental setup. For a set of curated genomes, we settled on the KEGG database, as it contained a large collection of consistently functionally curated microbial genomes and it has been widely used in metagenomics for sample annotation. The KEGG hierarchy of KEGG Orthology groups (KOs), Modules, and Pathways could then serve as a common basis for comparative analysis. To control for phylogenetic bias in our results, we sampled broadly across 23 phyla and 89 genera in the bacterial and archaeal tree of life, using a randomly selected strain in KEGG for each tip of the tree from Ciccarelli et al. From each of the selected 170 strains, we generated either *every* possible contiguous sequence of a given length or (in some cases) every 10th contiguous sequence, using a sliding window approach. We additionally introduced various models to simulate sequencing errors. This large collection of reads (totaling ~16Gb) were then aligned to the KEGG genes database using a translated BLAST mapping. To control for phylogenetic coverage of the database (the phylogenetic relationship of the database to the sequence being aligned) we also simulated mapping to many partial collections of genomes. We further used four common protocols from the literature to convert the obtained BLAST alignments to functional annotations. Comparing the resulting annotation of each read to the annotation of the gene from which it originated allowed us to systematically evaluate the accuracy of this annotation approach and to examine the effect of various factors, including read length, sequencing error, and phylogeny.

First and foremost, we confirmed that direct annotation of short reads indeed provides an overall accurate functional description of both individual reads and the sample as a whole. In other words, short reads appear to contain enough information to identify the functional annotation of the gene they originated from (although, not necessarily the specific taxa of origin). Functions of individual reads were identified with high precision and recall, yet the recall was found to be clade dependent. As expected, recall and precision decreased with increasing phylogenetic distance to the reference database, but generally, having a representative of the genus in the reference database was sufficient to achieve a relatively high accuracy. We also found variability in the accuracy of identifying individual KOs, with KOs that are more variable in length or in copy number having lower recall. Our paper includes abundance of data on these results, a detailed characterization of the mapping accuracy across different clades, and a description of the impact of additional properties (e.g., read length, sequencing error, etc.).

A principal component analysis of the pathway abundance profiles obtained for 15 HMP samples and by four different annotation protocols. HMP samples are numbered from 1 to 15 according to the list that appears in the Methods section of the manuscript. The different protocols are represented by color and shape. Note that two outlier protocols for sample 14 are not shown but were included in the PCA calculation. Figure and text adapted from: Carr R, Borenstein E (2014) Comparative Analysis of Functional Metagenomic Annotation and the Mappability of Short Reads. PLoS ONE 9(8): e105776. doi:10.1371/journal.pone.0105776. See the manuscript for full details.
A principal component analysis of the pathway abundance profiles obtained for 15 HMP samples and by four different annotation protocols.The different protocols are represented by color and shape. See the manuscript for full details. Figure and text adapted from: Carr R, Borenstein E (2014) Comparative Analysis of Functional Metagenomic Annotation and the Mappability of Short Reads. PLoS ONE 9(8): e105776 

Importantly, while the obtained functional annotations are in general representative of the true content of the sample, the exact protocol used to analyze the BLAST alignments and to assign functional annotation to each read could still dramatically affect the obtained profile. For example, in analyzing stool samples from the Human Microbiome Project, we found that each protocol left a consistent “fingerprint” on the resulting profile and that the variation introduced by the different protocols was on the same order of magnitude as biological variation across samples. Differences in annotation protocols are thus analogous to batch effects from variation in experimental procedures and should be carefully taken into consideration when designing the bioinformatic pipeline for a study.

Generally, however, we found that assigning each read with the annotation of the top E-value hit (the ‘top gene’ protocol) had the highest precision for identifying the function from a sequencing read, and only slightly lower recall than methods enriching for known annotations (such as the commonly used ‘top 20 KOs’ protocol). Given our lab interests, this finding led us to adopt the ‘top gene’ protocol for functionally annotating metagenomic samples. Specifically, our work often requires high precision for annotating individual reads for model reconstruction (e.g., utilizing the presence and absence of individual genes) and the most accurate functional abundance profile for statistical method development. If your lab has similar interests, we would recommend this approach for your annotation pipelines. If however, you have different or more specific needs, we encourage you to make use of the datasets we have published along with our paper to help you design your own solution. We would also be very happy to discuss such issues further with labs that are considering various approaches for functional annotation, to assess some of the factors that can impact downstream analyses, or to assist in such functional annotation efforts.

Story behind the paper: Bonnie Baxter on "A tale of salt and gender" #STEMWomen #Halophiles

After posting A tale of salt and gender: participation of women in halophile research I sent the post to Bonnie Baxter, one of the authors of the article I discussed and I asked if she would be interested in writing a guest post about the “Story Behind the Paper” (for which I have a whole series).  I am so so pleased that she said yes.  I have followed Bonnie’s work for many years but this is her first guest post here.  I hope there will be more.  She is a wonderful and brilliant scientist and educator.

Guest Post by Bonnie Baxter
Salty Sisters: The Women of Halophiles

Bonnie Baxter and Nina Gunde-Cimerman at the north arm of Great Salt Lake (2008)
I was drawn to the western US, the extreme landscapes, and ended up at the only liberal arts college in Utah. I had wanted a career doing science with undergraduates, and I set about exploring the microbiota of Great Salt Lake. Since few had studied this incredible spot, I quickly became the go-to person for studies on the lake, and these collaborations and grant projects eventually evolved into an organization I direct called Great Salt Lake Institute. We are dedicated to research, scholarship and education efforts on Great Salt Lake.
There had been no microbiology done on Great Salt Lake since 1979. This is why there was much excitement concerning our emerging data, and in 2004, I was invited to speak at the triennial International Halophiles conference in Slovenia. Halophiles are microbes that thrive at high-salt, and the people who study them maintain an interesting balance of field-work and lab work. I had been to large meeting on DNA repair, DNA replication, nucleases and the like, but I had never met a group who were centered on a theme that connected them around the planet. 
From my first Halophiles meeting (I’ve since attended 2007 in Colchester UK, 2010 in Beijing and 2013 at University of Connecticut), I felt an unusual level of support from the elders of this group. And I noticed that, unlike the NASA meetings or biochemistry meetings I attended, there seemed to be a nice balance of men and women. There were a group of folks who had participated for a long time, without a membership organization, and these people maintained the notion of mentoring in the field. It is this spirit that drew all of us younger folk to participate. 
At each of the International Halophiles conferences, there is typically a history talk that brings forth work from past scientists from the field. After an evening in Beijing, I lamented to Aharon Oren, who studies microorganisms of the Dead Sea, that I found his history talk very engaging, but he seemed to overlook the contributions of women. So he challenged me to give the next history talk in Connecticut. By the next morning, at our shared 6 am breakfast, Aaron gave me a list of 20 or so women he thought has contributed great things to the halophile field. I had been given a challenge, and I accepted. I invited an accomplice to the project, Nina Gunde-Cimerman, from University of Ljubljana, Slovenia, and we began our research.
Bonnie Baxter says “My daughter thought it was more appropriate if we dressed this way for the talk.  But this is not the way female scientists do their work…”
Given my connection to Great Salt Lake, I’ve been asked to give an unusual number of keynote addresses and special talks (for a professor at a liberal arts college). I have often been the only female speaker at a meeting, or the only woman on a national committee. Since graduate school, I have held an interest in exploring why there are underrepresented groups in science. Why is retention in STEM fields different for men and women? Why are women underrepresented as physics or mathematics professors in the US, but hardly at all in Russia or Italy? This is what drove me to undergraduate science, fixing these problems and better understanding them. 
In the summer of 2013, Nina and I gave the opening talk at the International Halophiles conference at UConn, entitled “Salty Sisters: The Women of Halophiles.” The talk included our analysis of the participation of women in these conferences since 1978. 
After reading many studies of women underrepresented as speakers, we were shocked that our numbers were very different. It appeared that the halophile organizers had done an excellent job of gender inclusion, relatively speaking. Following the talk, and for weeks afterward, many scientists (male and female) approached us, telling us their experiences as women in the field or discussing how important this topic was.
Nina Nina Gunde-Cimerman and Bonnie Baxter
We were thus inspired to publish a manuscript from the lessons we learned. As we looked at recent comparative studies, we learned more, in particular, the gender bias involved in speaker or author invitation. Please see the manuscript introduction for this important overview. Several publications pointed at the underrepresentation of women in invited speakers or authors for invited reviews. In problem-solving mode, Casadevall and Handelsman (2014) demonstrated that the inclusion of women on the organizing committee is critical to a balanced speaker docket.  
Bonnie, Aharon and Nina, Beijing 2010
What we learned as we analyzed the conference participation in our field, is that we were doing quite well in gender balance of invited speakers, 36% of the speakers were women since 1978! And indeed, women had been included in many of the organizing committees. We saw a 10-16% increase in female speakers when this was the case. We also came to understand that there was a small group of scientists who were committed to holding this conference with no organizational funding. This led to cooperation, collaboration, avid mentorship and strong friendships. This was a group that welcomed women, young scientists and peoples of all nations. I daresay that this is not always the situation in a particular field as the “village elders” may work by competition, not cooperation. These halophile elders, for example, worked to get external funding at each meeting to bring graduate students and post-docs to the conferences with little cost. 
Recent studies on gender bias in science are focused on numbers we can measure and methods to resolve the problem. Jon Eisen has been a strong proponent for what is becoming a national movement to require organizing committees to have written policies that include gender equity.  Scientists, male and female, should request this document and refuse to participate if it is not produced. 
The co-authors and I were so pleased to report a positive example in a sea of negative ones. I hope that this groupsof salty scientists can inspire others to build communities of inclusion as we learn from each other in exploring the natural world.

Guest post by Lizzy Wilbanks: Story behind the paper "Microscale sulfur cycling in the phototrophic pink berry consortia of the Sippewissett Salt Marsh"

Here’s another entry in the “Story behind the paper series”.  This one is from Lizzy Wilbanks, a co-advised PhD student in my lab (Twitter: @LizzyWilbanks)

A sulfurous symbiosis: Microscale sulfur cycling in the phototrophic pink berry consortia of the Sippewissett salt marsh 

Here’s the story behind my recent publication (with many talented coauthors) on the pink berries, the marvelous, macroscopic microbial aggregates of the Sippewissett.

A bit of background:

The wild microbe rarely eats alone. The microbial world is a jungle far more exotic than those we can see (metabolically and phylogenetically, at least), one rife with fierce competition, intimate cooperation, and intricately inter-dependent food webs. Eavesdropping on the metabolic conversations of uncultured microbes, though, remains a major technical challenge.  It requires tools to navigate the world from the microbe’s-eye view.

 Your binoculars just aren’t gonna cut it…  (image source )
In our recent paper, my co-authors and I describe how we were able to tune in to one such metabolic conversation, and look at a nutrient (‘biogeochemical’) cycle on the microbial scale. Here’s the back-story on how this project got started, and why I’m so excited to share our work with you!

Let’s get one thing straightened out:

‘Pink berries’ are a nickname for these pink colored microbial aggregates.  We’re not talking about fruit or frozen yogurt here.
(image source: my own, here, and here)

My first encounter….

I first encountered these eye-popping pink wonders in 2010 when I was as a second year grad student attending the Microbial Diversity summer course at the Marine Biological Laboratory in Woods Hole, MA.  Exploring the nearby Little Sippewissett Salt marsh for our first field trip, I stomped through the marsh grass into a muddy, sulfidic pool.
And people wonder why I think sulfide smells like beautiful summers and nostalgia?
(image source: my own)
Below the surface of the pool’s water, scattered across the sediment, was a truly magnificent carpet of pink blobs. 
(image source: my own)
After a bioinformatics-heavy start to grad school at UC Davis, I was dying to get my hands dirty with some fieldwork.  I was transfixed by the stinky, sulfidic marsh mud and these slimy pink aggregates. 
Me, awfully excited and really “diving-in” to the project.
Can’t remember how many times TA Annie Rowe and others had to fish me out of the mud that summer!

(image source: Melissa Cregger 🙂 
Course directors at the time, Dan Buckley and Steve Zinder, told me that these were the pink berries, balls of uncultured bacteria found in the Sippewissett marsh (and, so far as they knew, nowhere else). Summer students had been looking at the berries ever since the course was founded 40 years ago, they said, and they pointed me towards a pile of old course reports back at the lab.  

Berries: an MBL Microbial Diversity legacy.

These reports (now digitized and freely available) tell the tale of many happy, hard-working summers where students took a crack at these exotic looking blobs during their independent research mini-projects.  One of the most fun parts of this project has been meeting all of these “berry alumni”, both via email and in person, who are now scattered throughout the world. From helpful discussions, to sharing data and suggestions, and even digging up never-published 16S rRNA gene sequences from over a decade ago (thanks Bruce Paster and Jarrod Scott!), the berry-alums have helped lay the groundwork for our project and have been an amazing network of friends and collaborators.  
Our paper is a sequel, 20 years in the making, to the first and only other paper describing the pink berries.  Published in 1993 by MBL summer students Angelica Seitz and Tommy Nielsen with course faculty Dr. Jörg Overmann, this work described the berries as aggregates dominated by uncultured purple sulfur bacteria, anoxygenic phototrophs that oxidize hydrogen sulfide to sulfate (unlike cyanobacteria and green plants that oxidize water to oxygen). By spearing berries with oxygen microsensors, they found that the berries were such hot-spots of microbial activity that all oxygen was consumed just a few micrometers below the surface, creating a haven for anaerobic microorganisms.  
My obviously-not-to-scale cartoon of berry spearing with oxygen microsensors.
The purple sulfur bacteria give the berries their rosy hue with their photosynthetic pigments that have evolved to capture lower-energy, longer wavelength light (compared that used by green phototrophs). 
Peering into the pink berries with a dissection microscope (real color!).
Pink blobs are islands of purple sulfur bacterial cells.

(image source: Verena Salman) 
With the introduction of 16S rRNA gene sequencing to the course in 1997, students discovered that, in addition the conspicuous purple sulfur bacteria, the berries also harbored an abundance of an uncultured species related to sulfate reducing bacteria (sulfate -> sulfide).  The co-occurrence of putative sulfide-oxidizing purple sulfur bacteria and sulfate reducing bacteria spawned the hypothesis that these species might be metabolically interdependent, creating a “cryptic” sulfur cycle within the berries.  
The hypothesis! Purple sulfur bacteria in pink, sulfate reducing bacteria in green.
(image source: my own, modified version of Figure 9 from our paper) 
These sulfate reducing bacteria, though, had remained elusive, uncultured, and their activity, undetected. This intriguing hypothesis about an “intraberry” sulfur cycle and metabolic cooperation (‘syntrophy’) remained untested like so many other questions about the secret lives of uncultured microbes.

Project launch: Team berry 2010

Resolved to work on the pink berries for my mini-project, I banded together with fellow students and co-authors Ulli Jaekel and Parris Humphrey, and with the help of TAs Cristina Moraru and Rebekah Young – formed Team Berry 2010.  We began investigating the pink berries using DNA sequencing (16S, metagenomics), microscopy (FISH, TEM) and other incubation studies. 

The first few weeks at the MBL course were bonanza of microbial excitement for me as a huge metabolism geek.  My mornings were spent trying to drink from the fire hose of information in lecture, followed by afternoons of lab, then dinner, more lab, and finally trying to piece together the day’s ideas over beers.

“Drinking from a fire hose” – another gem from PhDComics

Coming back from Dan Buckley and Victoria Orphan‘s lectures about the uses of stable isotopes in microbial ecology (reviewed here), I wondered if there was a way to use sulfur stable isotopes to track the cryptic sulfur cycle in the pink berries.  Brainstorming with Victoria, we devised a plan to conduct incubations with the pink berries using isotopically heavy sulfate (34SO42-) as a stable isotope label.  The purple sulfur bacteria in the berries had abundant intracellular sulfur reserves, which typically come exclusively from reduced forms of sulfur (e.g. sulfide).  Our hope was that the sulfate reducing bacteria would reduce the heavy sulfate we added to heavy sulfide, which would then be oxidized by the purple sulfur bacterial and incorporated into their cells.

To track the flow of our isotopically labelled sulfur, we planned to image thin sections of the incubated berries using nanometer scale secondary ion mass spectrometry (nanoSIMS), an instrument commonly used by the Orphan lab for studying anaerobic methane oxidizing consortia.

Using the nanoSIMS to blast sections of pink berries with  focused cesium beam (~50nm spot size)
and generate spatial maps of isotopic and elemental abundance.  
(image source: my own)

At that time, there was no precedent in the literature for using 34S-isotope labeling in this way (most stable isotope probing experiments focused on carbon or nitrogen compounds), but Victoria’s group was interested in exploring this area for studying other tightly coupled sulfur-cycling.  The berries were an accessible testing ground. After a madcap two weeks of rush-orders, late nights, midnight berry slicing, and help from so many wonderful, patient TAs, our samples made a cross-country journey to the Orphan lab at Caltech where they, and thankfully the nanoSIMS, survived a minor earthquake.  

The nanoSIMS beast in its subterranean lair @ the Caltech Microanalysis Center.
(image source: my own)

It was a wild ride during those final weeks, but just before the end, we got exciting results from Victoria’s nanoSIMS run that suggested our experiment had worked.  The preliminary nanoSIMS data showed accumulation of our sulfur isotope label (enrichment in 34S compared to controls), and also found evidence for carbon fixation (13C enrichment from labeled bicarbonate additions).

Can’t stop, won’t stop… the side-project that ate my thesis.

After returning to Davis, passing my qualify exam and wrapping up prior projects, I was determined to get back to berries but wasn’t sure exactly how.  Victoria suggested that she could include berries in a collaborative NSF proposal on the biogeochemistry of tightly coupled sulfur cycling consortia (along with David Fike, Greg Druschel and Jesse Dillon).  When their funding came through, it held out the safety net I needed to work on berries full time.  With approval from Victoria and my co-advisers at Davis, I jumped!

Returning as a TA to the MBL Microbial Diversity course in 2011, I had a chance to conduct follow up isotope experiments, and collaborate with course student and co-author Verena Salman on developing species-specific FISH probes to identify the spatial arrangements of the two berry symbiotic.  Since then, I’ve followed up on our initial metagenomic sequencing to reconstruct near-complete genomes for the two berry symbionts, demonstrating the genetic potential for a complete sulfur cycle.

Figure 4 from our paper showing:
the sulfate reducing species (green rods, 16S rRNA gene FISH probe)
snuggled up with their metabolic partners,
the purple sulfur bacteria (pink/purple cocci, autofluorescence),
but not in the exopoylmer matrix with  
other cell types  (blue, DNA stain: DAPI).

In 2012, the final pieces of this project came together during a week of Sippewissett fieldwork with biogeochemistry collaborators  David FikeGreg Druschel, and their groups.  With high resolution geochemistry equipment aboard our homemade raft, we were able to link our existing microbiological measurements with microscale geochemical signatures in the berries.

(image sources: my own)


Using the pink berries, we demonstrate how an integrative microbiological and microgeochemical approach can be used to decrypt the microbial metabolic partnerships that drive sulfur cycling at the microscale. This methodology, which may ultimately be used to examine more complex ecosystems, offers direct evidence of syntrophic interspecies sulfur transfer. 
For more details on how all these different pieces came together, you’ll just have to check out our paper yourself!   


What do they taste like?
Mostly just salty, and a bit sandy 🙂

Are the pink berries found anywhere else?
Not really!  I’ve looked through the literature and chatted up loads of people, but no one’s ever reported seeing pink berry-type macroscopic consortia of purple sulfur bacteria and sulfate reducers.  There’s a description of a microscopic type pink berry-like aggregates in the chemocline of Lake Cadagno, and interestingly those aggregates’ sulfate reducing isolate (Cad626) is closely related to our PB-SRB1 sulfate reducing species.   Should you find berries somewhere else during your marshly peregrinations, email me!
Have you tried culturing them?
Yes!  My undergraduate students recently confirmed that we have an enrichment culture of the purple sulfur bacterial strain, and are working to purify it, and submit it to a culture collection.  If you’re interested in working on it, I’m happy to send you a sample of the culture.  The sulfate reducer has, so far, resisted my efforts to coaxing it into culture but hasn’t really been a major focus of my project (I’d wager it’s possible).
So wait, why are you studying them again?
  • My naturalist’s answer is: because they’re the pink, charasmatic macrofauana of the microbial world. They’re nifty, and we don’t know what they do. But seriously… 
  • Microbial metabolism is the engine that drives the nutrient (biogeochemical) cycling that shapes the health of both our planet and our bodies.
  • However, many key transformations in these cycles are carried out by microbial consortia over short spatiotemporal scales that elude detection by traditional analytical approaches. 
  • The berries provide a tractable, reproducible model microbial consortia for developing methods to eavesdrop on these otherwise cryptic metabolic conversations between the wild microbes.
  • Understanding the biosignatures (e.g. sulfur isotopic fractionation) produced by microbial communities like the pink berries improves our ability to interpret the rock record and construct models of ecosystem function in both ancient and modern environments.

    Thank you:

    Through this project, I’ve had the privilege of working with truly amazing people and making life-long friends.  The author list and acknowledgement are just the tip of the iceberg in terms of people who have contributed to this project in one way or another.  You all know who you are; I feel so lucky to have gotten to know and work with you. THANK YOU!

    This project was started as grass-roots style, curiosity-driven student research, and as such, the funding for it has been fairly eclectic.  I want to take a moment to acknowledge those organizations that have supported this kind of research and made my work possible.

    Funding to the MBL Microbial Diversity course from:

    • Howard Hughes Medical Institute
    • Gordon and Betty Moore Foundation (#2493)
    • National Science Foundation (DEB-0917499)
    • US Department of Energy (DE-FG02-10ER13361)
    • NASA Astrobiology Institute (NAI)

    Grants to collaborators Victoria Orphan and David Fike from:

    • NSF (EAR-1124389 & EAR-1123391)
    • Gordon and Betty Moore Foundation (#3306)

    Grad-student grants and fellowships supporting my work at UC Davis from:

    • National Science Foundation Graduate Research Fellowship
    • UC Davis Dissertation Year Fellowship
    • P.E.O. Scholar Award
    • NAI/APS Lewis and Clark Fund in Astrobiology
    • NSF Doctoral Dissertation Improvement Grant (DEB-1310168)

    Full citation:

    Wilbanks EG, Jaekel U, Salman V, Humphrey PT, Eisen JA, Faccioti MT, Buckley DH, Zinder SH, Druschel GK, Fike DA, Orphan VJ. (2014) “Microscale sulfur cycling in the phototrophic pink berry consortia of the Sippewissett Salt Marsh.” Environmental Microbiology,  doi:10.1111/1462-2920.12388

    The story behind “Programmable removal of bacterial strains by use of genome-targeting CRISPR-Cas systems”

    Below is another in the “Story Behind the Paper” series.  This one is by Chase Beisel from NC State.

    <!– /* Font Definitions */ @font-face {font-family:Times; panose-1:2 0 5 0 0 0 0 0 0 0; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:3 0 0 0 1 0;} @font-face {font-family:"MS 明朝"; mso-font-charset:78; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-536870145 1791491579 18 0 131231 0;} @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-536870145 1107305727 0 0 415 0;} @font-face {font-family:Cambria; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:0; mso-generic-font-family:auto; mso-font-pitch:variable; mso-font-signature:-536870145 1073743103 0 0 415 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0in; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"MS 明朝"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-family:Cambria; mso-ascii-font-family:Cambria; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"MS 明朝"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Cambria; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} @page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.25in 1.0in 1.25in; mso-header-margin:.5in; mso-footer-margin:.5in; mso-paper-source:0;} div.WordSection1 {page:WordSection1;} –In this guest post, I tell the story behind the paper my colleagues and I recently published in mBio. This post briefly recounts the trials and travails of my research group’s first publication and describes the remarkable versatility of CRISPR. Before launching into the highs and lows of the idea-to-paper process, I want to thank Jonathan for this unique opportunity to share our story.

    CRISPR: an abbreviated tutorial for the uninitiated

    Our paper offers a novel application of the CRISPR-Cas adaptive immune systems. Unlike humans’  adaptive immune systems, CRISPR-Cas systems use RNA to recognize foreign invaders. Recognition occurs through simple base pairing between the RNA (called CRISPR RNAs) and complementary foreign nucleic acids, leading to target cleavage and degradation. Through a poorly understood mechanism, these systems can acquire new CRISPR RNA-encoding sequences, providing immunity against future infections.

    Overview of DNA-targeting CRISPR-Cas systems.

    One of the remarkable aspects of CRISPR-Cas systems is that synthetic CRISPR RNAs can be designed to guide cleavage of almost any DNA sequence. This ability in turn has opened a remarkably broad set of applications, including genome editing, transcriptional activation and repression, phage defense, genotyping, synthetic restriction enzymes, and curing of latent viruses. I have no doubt that (1) I neglected to mention a few and (2) others will be reported in the oncoming months.

    An idea is born

    The story begins when I was a postdoc in the Storz lab at the National Institutes of Health. I was characterizing Hfq-binding small RNAs in E. coli and was interested in understanding and exploiting regulatory RNAs. It was during this time I learned about CRISPR-Cas systems. I was intrigued about the parallels between these systems and RNA interference–the focus of my PhD thesis–where the principal difference was that CRISPR-Cas systems seemed to go after DNA whereas RNA interference went after RNA. Note that this was also at a time (2010) when little was known about the system, let alone its biotechnological potential. At the time, the system had been shown to go after the DNA of foreign invaders, although one of the initial questions was why it didn’t go after its own DNA. While excellent work by Luciano Marraffini and others showed that a few safeguards were in place to prevent self-targeting, other work by Rotem Sorek and Udi Qimron suggested or demonstrated that CRISPR could target the genome and (most importantly) that this was a bad thing.

    The idea.

    During a trip to the University of Washington to visit my colleague Georg Seelig, we concocted the idea of targeting the microbial genome with CRISPR on purpose. What was so appealing about the idea was that (1) we were inducing the equivalent of an autoimmune response, (2) targeting would be sequence-specific, and (3) the mechanism of attack was independent of how antibiotics act. For these two reasons, we saw genome-targeting CRISPR RNAs as a “smart” antibiotic that could selectively kill bacteria and circumvent antibiotic resistance. Suffice to say, we were excited.

    Obtaining funding (or not)

    Our first step was to obtain funding for this idea. We first tried the Gates Foundation and the USAMRMC, although neither organization funded the work. Later, I submitted the idea to ARO, NSF, and the Pew Research Foundation. Still no funding. Fortunately, an internal funding source at NC State University provided a small grant to pursue the idea. This grant and my start-up funds were sufficient to carry the project to completion.

    The long research path

    While Georg focused on other pursuits, I began my faculty position at NC State and made this idea one of my lab’s first projects. My initial goal was to evaluate how plasmids encoding genome-targeting CRISPR RNAs affect transformation efficiency, an imperfect but reasonable proxy of killing. Heidi Klumpe, a talented undergraduate student who joined my fledgling lab, cloned most of our initial constructs. Unfortunately, we had to go through a few design rounds before finding a construct in which we could easily and cheaply clone in new CRISPR RNAs. During this time, one of my first graduate students, Ahmed Abdelshafy Gomaa, joined the group and began working with Heidi. The two made great progress and, after ample troubleshooting and optimization, settled on a system that showed large reductions in the transformation efficiency (~105) when targeting the genome. Anticipating the potential to be scooped (a common experience in the CRISPR field), I convinced Michelle Luo, a more recent graduate student in the group, to help advance the experiments. In the end, the three students were doing endless transformations and dilution plating, then counting colonies over and over again. I am grateful that they never complained.

    What was intriguing about these experiments was that only two “design rules” needed to be followed: (1) find a protospacer-adjacent motif or PAM–a short sequence recognized by some Cas proteins–and (2) incorporate the adjacent sequence into a CRISPR RNA. It didn’t matter which sequences we targeted, whether the sequences were in coding regions, non-coding regions, top strands, bottom strands. As long as we followed these rules, there was a tremendous reduction in the transformation efficiency.

    Targeted removal. Credit: C. Beisel/mBio.

    We next wanted to prove that the sequence specificity of killing could differentiate even closely related strains. After much debate about which strains to test, we chose our K-12 strain of E. coli and a B strain, one of its cousins. We needed to find unique sequences between the two genomes, and, although there are likely simple bioinformatics tools to do this, Ahmed manually went through the genomes to find unique sequences. Fortunately, he didn’t have to work too hard despite the fact that the bacteria share 99% of the genomic content. The resulting tests confirmed our predictions: target one strain and only that strain transforms extremely poorly. We incorporated Salmonella to differentiate commensals and pathogens (and to increase the attractiveness of this work to publishers), although these experiments were delayed as we sought BSL2 approval.

    A fruitful collaboration

    During this time, I met Rodolphe Barrangou, a giant of the CRISPR field who was still working in industry. We struck up a friendship that later led to an ongoing collaboration once he decided to join NC State’s faculty. Rodolphe has been working with Streptococcus thermophilus, which encodes four different CRISPR-Cas systems. Through our interactions, we decided that demonstrating genome targeting through two of its endogenous CRISPR-Cas systems would further strengthen the story. Fortunately, the data quickly came thanks to the efforts of Rodolphe’s first lab member, Kurt Selle. With these data, we felt that we had a sufficient story to submit for publication.

    The publication process

    Based on the novelty of the idea, general interest in all things CRISPR, our data, and (to a certain degree) my own naivety, we shot high. Unfortunately, we didn’t make it past the editors at Nature Biotechnology, so we next tried submitting a Brief Communication to Nature Chemical Biology. The editors were kind enough to send it out for review, although the reviewers were not so kind, questioning the novelty of the idea and its downstream potential. However, the reviews were extremely helpful as we repackaged the work and performed additional experiments demonstrating selective removal in mixed cultures and the selective titration of individual strains.

    Encouraged by the new version of the paper, we next tried PNAS. However, we didn’t make it past the Editorial Board, so we moved on to Nucleic Acids Research. Again, the editors said “no”–in this case, because our paper fell outside of the scope of the journal. Not sure where to go next, we chose mBio, an up-and-coming Open Access journal that publishes broadly across the field of microbiology. Half expecting another rejection before review, we were pleasantly surprised that the paper went out and received positive reviews. After a month of additional experiments, we were able to resubmit the final version that was accepted shortly thereafter. I received the acceptance email on December 20th–a wonderful Christmas present.

    The aftermath

    Matt Shipman in the the News Services Office at NC State prepared a press release for the article–a collaboration I would recommend to researchers who have not interacted with their institution’s news office. That said, inaccurately written releases can promise too much, creating false impressions of the work’s potential and (if nothing else) annoy your fellow colleagues.

    Thankfully, our press release was picked up by a number of science websites. Nature also highlighted this work in its most recent issue, though I’m not sure whether their interest had anything to do with the press release. Most importantly, through the press release, Matt putt me in touch with Jonathan, and the rest is history.

    Story behind the paper guest post by Corey Nislow (w/ Metka Lenassi) on "Genomics w/o Borders"

    Below is another in the “Story behind the paper” series of guest posts here.  This one is from Corey Nislow w/ Metka Lenassi.  If anyone else has published an open access paper on anything relating to this blog and would like to write a guest post on the Story behind the paper, please let me know.

    Genomics without Borders: Genome Sequence of the Extremely Halotolerant Yeast Hortaea werneckii 

    by Corey Nislow (with Metka Lenassi)

    In this guest post (thank you Jonathan!) I wanted to tell the story behind a paper that my colleagues and I published two weeks ago in PLoS ONE. The story also offers an opportunity to talk about what role, if any, a middle author can play in a scientific study.

    The story is set in Slovenia a beautiful country which was part of the former Yugoslavia and which is home to about 2 million inhabitants, 2400+ fungal species (thanks Wikipedia) and some very interesting environments. One of these environments is the Secovlje Salterns where one can find the yeast Hortaea werneckii.

    A worker harvests sea salt in the Secovlje salterns, July 17, 2010. Some 2600 tons of salt is expected to be produced during the two and a half month season at the salterns.(Xinhua/Reuters Photo)

    I hadn’t heard of Hortaea until I started googling around looking for a yeast extremophile that I can grow in the lab to dissect out its nucleosomes to ask questions regarding nucleosome occupancy and transcription in the face of extreme environments. Turns out it was not a crazy idea–

    13 years ago a peculiar black yeast Hortaea werneckii was isolated from its natural habitat: waters containing so much salt, it would kill most living organisms instantly. Since then, two small (but enthusiastic) Slovenian groups have tried to understand its halotolerance. This demanded field trips to the beautiful Slovenian coast, but also a lot of hard work and inventiveness to optimizing protocols used for other organisms – and to do it on a low budget. The first important obstacle was actually cultural – to persuade the scientific community that such extreme yeast even exists in nature! You can see it below. We now have ample evidence as Hortaea has been isolated from many seawater-related environments, saline lakes, but also from surface layers of tropical microbial mats in salterns and even from spider webs in Atacama Desert caves. All these different Hortaea strains are now waiting in their freezer (the Ex culture collection) to be analyzed.

    Hortaea werneckii growing happily on 2M salt.

    The figure below summarizes what was known about halotolerance of Hortaea before the genome sequence was decoded. In brief, high salinity is detected by sensors of the HOG signaling pathway (green arrows), which modulate the expression of salt-responsive genes (underlined green). The expression of other genes also varies; genes with higher expression at high salinity are written in red, repressed genes in blue). The impact of a hyperosmolar environment is countered by increasing the energy supply to drive energy-demanding processes such as export of Na+ and H+, import of glycerol andthe synthesis of compatible solutes. Melanization of the cell wall reduces the leaking of solutes from the cells and restructuring of membrane lipids helps preserve the integrity of the cells. Read this paper if you want to know more: Gostinčar et al, Adv Appl Microbiol. 2011;77:71-96 .

    Gostinčar et al, Adv Appl Microbiol. 2011;77:71-96

    This critter, as our recent paper reports, is as interesting genotypically as it is phenotypically. The full genome sequence reported in the PLoS ONE paper shows that genome size is 51mB quite a bit larger than its closest relatives, and given the number of gene models detected (20,000!), for all intents and purposes it looks like Hortaea underwent whole genome duplication last weekend!

    Piquing my interest, I immediately started searching for the the genome sequence to have a reference to map nucleosome sequencing reads. Turns out, I had requested the strain from Metka years ago, only to find that one of our lab mates, Uros, with whom I was collaborating at the University of Toronto, had performed some of the groundwork on Hortaea for his PhD. But the network connections don’t stop here, I moved to the University of British Columbia last year, and as it happened Hortaea is popular in Vancouver too! Our new colleagues at UBC were working together with the Slovenian team on sequencing and analyzing the Hortaea genome. In fact, the collaboration started in 2005, catalyzed by a poster at the Budapest FEBS conference were Metka, at that time still a PhD student, and Ivan Sadowski started a discussion about the interesting phenotypic switch that Hortaea undergoes between yeast and filamentous forms. So, by virtue of a convergence of curiosity, good luck and generous collaborators I had the good fortune of being an active participant in the study.

    So how does this have anything to do with what a middle author does or doesn’t do on a manuscript? And why do I care? 

    Well, I recently re-read the comments section of a fellowship application, and ginned up the guts to read the “supervisor/training environment” section. The chief criticism was that I have a lot of papers on which I am not the senior author. So to the skeptics, I would say- even middle authors play important roles in bringing a study to an audience. In my case my self-interests guided my actions, but along the way, I had the chance to learn about an extraordinary critter, and an amazing group of Slovenian scientists. Yeah, I needed the genome sequence, but I was also excited to help drafting the manuscript, have our sequencing facility prepare additional libraries to close some gaps, and now to bring attention to this extraordinary critter.

    The genome sequence offers an exciting new start in studies of Hortaea werneckii. Going forward, the Slovenians want to study its transcriptome and proteome in response to increasing salinity. Preparing knock-out mutants is also a must, to find key genes important for halotolerance. We definitely want to take a closer look at all those cation transporters and their functions. It would also be fun to find its mating partner in one of those frozen Hortaea samples. And now that the genome sequence is available to everybody, the research on this extremely interesting species may start to gain more appeal even to researchers beyond the two stubborn Slovenian groups.

    Although I might not get to Slovenia in the foreseeable future, I wouldn’t be surprised if one of my graduate students will meet up with the group at an upcoming yeast meeting. This particular student is dragging our lab into evolutionary genomics by trying to see if he can’t get Hortaea to lose some of its genome in long-term culture (I can’t help but think of “Amadeus” where Salieri is telling Mozart that the composition is fine but it has too many notes….). I’m sure the results will be surprising, and am also encouraged to see what our future collaboration will bring.

    New paper from the Eisen lab: Sporulation phylogenetic profiling

    Quick post here. This paper came out a few months ago but it was not freely available so I did not write about it until now as it just showed up in Pubmed Central. It was published in the Journal of Bacteriology but they do not release material for free onto their website or Pubmed Central for a few months. Alas, as I was kind of a peripheral player in the main work in the paper (I helped them with the phylogenetic profiling part) I did not end up pushing as hard as I should have for paying the open access fee to make it available earlier / openly.

    Here is a link to the paper: Gene Conservation among Endospore-Forming Bacteria Reveals Additional Sporulation Genes in Bacillus subtilis.

    It is from Richard Losick’s lab at Harvard and it is one I am very very pleased with. Basically, Losick’s lab has been studying sporulation in Bacillus subtilis like forever. And in 2005 we wrote a paper on the genome of another member of the same phylum that also sporulates (Carboxydothermus hydrogenoformans): Life in Hot Carbon Monoxide: The Complete Genome Sequence of Carboxydothermus hydrogenoformans Z-2901.

    And in that paper we did a phylogenetic profile based analysis of sporulation genes and found a set of genes that were (on average) in all the sporulating species and not in non sporulating species.  Among this set of genes were quite a few that nobody had ever shown to be involved in sporulation.  We predicted that they were likely involved in sporulation. 
    And then I waited, since I did not really work on sporulation.  And in a series of discussions with Losick and people in his lab found out that they had evidence that many of these genes in B. subtilis were involved in sporulation.  And the latest paper is in essence a follow up on some of those discussions (well, really it is a lot of work from Losick’s lab with a little input from those conversations to guide some of the experimental tests).