Why endosymbionts rule – see #PLoS Genetics paper on origin of an alternative genetic code

ResearchBlogging.org

Way way way cool new paper in PLoS Genetics from Nancy Moran’s lab. The paper (Origin of an Alternative Genetic Code in the Extremely Small and GC–Rich Genome of a Bacterial Symbiont). The paper discusses the use of genome sequencing and proteomics (as well as a variety of bioinformatic analyses) of a bacterial symbiont (Hodgkinia) of cicadas.

And for those not in the know, this is an Open Access paper using a broad Creative Commons license (since it is in a PLoS journal) so anyone can reuse material from it as long as the source is cited. This image to the left is from their paper so I am citing the source here: McCutcheon JP, McDonald BR, Moran NA (2009) Origin of an Alternative Genetic Code in the Extremely Small and GC–Rich Genome of a Bacterial Symbiont. PLoS Genet 5(7): e1000565. doi:10.1371/journal.pgen.1000565

The study has some interesting things including:

  • the genome of the symbiont has a much higher GC content than other small bacterial genomes for which the sequence is available
  • the symbiont is member of the alpha proteobacterial group, which is somewhat unusual since most other insect endosymbionts that have been studied are from the gamma proteobacterial group and/or the bacteroidetes clade
  • the UGA codon in this species is used to encode tryptophan and not as a stop codon

Taken together these things are very interesting since other species that have been found to have the UGA codon reassigned to code for an amino acid all have low genomic GC content. This correlation led people to conclude that the codon reassignment was directly related to the low GC content. However, the authors suggest here that the UGA reassignment in many species might be due to the genome reduction (loss of genes) seen in endosymbionts and not to low GC content.

Anyway the paper is worth a read …

McCutcheon, J., McDonald, B., & Moran, N. (2009). Origin of an Alternative Genetic Code in the Extremely Small and GC–Rich Genome of a Bacterial Symbiont PLoS Genetics, 5 (7) DOI: 10.1371/journal.pgen.1000565

Aphid-bacterial symbiosis in more detail, and in the New York Times

Nice little bit in the New York Times tomorrow about aphids and their symbionts. Henry Fountain writes (Observatory – How Tiny Insects, With a Little Help, Survive on Plant Sap – NYTimes.com) about a new article by Angela Douglas, one of the true pioneers of endosymbiont research. In her study she dissects in fine scale detail which essential amino acids are missing from the aphid sap only diet and which ones are made by the symbionts. Interestingly, the research apparently shows that the aphids may have figured out how to make methionine by themselves. I say apparently since I have been unable to track down the paper which I assume is coming out soon.

I should note, in one of the symbioses like this that I have studied with Nancy Moran we found that there were two symbionts contributing to the nutrition of the host. We found that one of the symbionts was likely making amino acids for the host (an insect called the glassy winged sharpshooter which eats only xylem sap) and the other symbiont was likley making vitamins. Nancy showed later with John McCutcheon that the symbiont that was making vitamins also was predicted to be making methionine for the host. So it seems possible there might be a missing symbiont in the aphid study? Although it would be cool if the aphid has figured out how to make an amino acid most animals are not able to make.

Hat tip to Max Lambert for pointing this out.

Mutualisms Rule – So Says Olivia Judson at the Wild Side

Nice blog today on mutualisms by Olivia Judson who writes the Wild Side blog/column for the New York Times (I seem to be writing a lot about writers for the NY Times these days … not sure what is going on with that). She even features one of my favorite organisms in the blog:

The clam Calyptogena magnifica, which lives on deep-sea vents, depends on a bacterium to supply it with nutrients; the bacterium is transmitted through the clam’s eggs

Last year we published a paper on the complete genome sequence of this symbiont (which I wrote about here when I was clearly in a whiny kind of mood). And Judson picks up on a part of the story on the clam that is rarely discussed – the symbionts are transmitted vertically from parent to offspring. Vertical transmission seems to be linked to multiple properties of the symbionts (see my discussion of this regarding the glassy winged sharpshooter symbionts here).

Judson’s post is really worth checking out for the symbiosis fans out there. She does a good job of highlighting diversity and evolution of mutualisms in a relatively short post.

See my video of a dissection of a baby Calyptogena:

Sharpshooters, dual symbioses and new ways to sequence a genome

Those interested in symbioses and in new sequencing methods should look at a paper that just came out in PNAS by John McCutcheon and Nancy Moran (OK – I am a bit biased – this work is related to something I did previously with Nancy). Their paper reports a further dissection of a dual symbioses in sharpshooters (a group of insects that feed on xylem sap). The dual symbioses involves two types of bacteria that live inside specialized cells in the gut of these insects.

Previously, my group had worked with Nancy to sequence the genome of one of the symbionts (Baumannia) as well as part of the genome of the second one (Sulcia). Nancy was interested in this symbioses for many reasons including that as obligate xylem feeders the sharpshooters almost certainly were not getting gall the nutrients they needed in their diet. Based on what was known about bacterial symbionts in other sap feeding insects (e.g., aphids) it seemed likely that the symbionts of the sharpshooters were making the missing nutrients for their host. However, all previous genomic based studies had been done on phloem feeding insects like aphids. Phloem and xylem are the two main circulatory systems in plants. Phloem tends to be nutrient rich, although still not rich enough for the aphids to live on it alone. Thus the aphids rely on bacterial symbionts to make amino acids missing in the phloem.

Xylem is generally much poorer in nutrients and this Nancy wanted to compare the genomes of the symbionts of xylem feeders with those of phloem feeders. Nancy and others had done preliminary work on the sharpshooters showing that they had multiple symbionts living inside cells in their gut and that one of the symbionts (which she named Baumannia after Paul Baumann who she had worked with previously) was closely related to the Buchnera symbionts found in aphids.

So Nancy approached me when I was at TIGR and asked if I would be interested in helping her sequence the Baumannia genome. I said yes (secretly, truth be told, I would have tried to sequence the genome of a rock if Nancy asked. She is perhaps the smartest person I know in all of science and is always doing the coolest types of research. Plus, I figured, I might also be able to interact with her husband, Howard Ochman, who also does cool stuff).

Of all the possible sharpshooters (the symbionts are found in all sharpshooters), Nancy chose to focus on the glassy winged sharpshooter because it is an important pest organism (it is a vector for Pierce’s disease in grapes).

So – we (well, the core facility at TIGR under my supervision) sequenced the Baumannia genome using DNA that Nancy had isolated from dissections of the gut of glassy winged sharpshooters. In analysis of the genome we (well, again, the royal we — in this case Dongying Wu in my lab did most of the analysis) found, among many things, that Baumannia appeared to be making vitamins and cofactors for the host. But alas, we also found something missing — Baumannia did not appear to be able to make amino acids for the host. Since xylem was likely to be missing amino acids that all animals require in their diet, we had figured that Baumannia must be making them for the host. So we were vexed.

That was, until Nancy pointed out (or reminded us – since she probably had mentioned it before) that there was another symbiont living in the gut of these insects — a symbiont called Sulcia. She suggested that we look at the DNA sequence pieces that did not assemble with the Baumannia genome and look for any that might encode genes similar to genes from the group of bacteria in which Sulcia is found. And, 1.5 years later, after much informatics and lab work, we obtained about 130 kb of the genome of this second symbiont and found that it encoded at least some of the essential amino acid synthesis pathways that could make the needed amino acids for the host. And we stopped there, published a paper in PLoS Biology proposing the existence of a dual symbiosis with one symbiont making vitamins and cofactors and the other making amino acids, and moved on to other things.

Now in this new paper, Nancy’s lab has returned to this symbioses and has finished the genome of Sulcia (the genome is available here in Genbank). And the story just gets cooler and cooler. With this complete genome they get a more detailed picture of the symbiosis than we were able to obtain, and are able to really reconstruct the whole system (and correct some mistakes we had made in our paper). My favorite thing in their paper is Figure 3 which you can find here (I am not sure about the PNAS policy of putting the image in my blog since this does not seem to be an Open article). This figure shows their reconstruction of what could be called to community metabolism. Interestingly it appears the symbionts depend on each other and are not just passing things on to the host separately.

Another important aspect of their paper is that it is the first (as far as I know) example of a genome being finished using a combination of the two hot new sequencing methods – 454/Roche and Illumina/Solexa. Basically they used the Roche/454 method to provide deep coverage of the Sulcia genome and then used Illumina/Solexa sequencing to get accurate sequence data for the types of sequence for which the Roche method does not work well.

So – check out the paper in PNAS. You won’t regret it.

Symbionts have feelings too

The Onion, one of the key sources of all things wise and accurate in science is reporting on a sad story all too common among symbionts. They report:

After three rainy seasons together, a black rhinoceros and a parasite-eating tickbird are beginning to suspect that their symbiotic relationship has fallen into a rut, the couple reported Sunday.

And furthermore

“The rhino and tickbird may have evolved physiologically to meet each other’s needs, but it’s clear they haven’t evolved emotionally,” the elephant said. “They need to recognize that in order to go forward. The rhino’s loud snorting is very alienating. And obviously the tickbird is projecting her own feelings of inadequacy when she criticizes the rhino for being a typical Diceros bicornis.”

In other words – mutualisms are not the simple “You scratch my back, I’ll scratch yours” they are presented to be. Symbionts have feelings too.”

PS – Thanks to Sourav Chatterji in my lab for pointing out the Onion story.

From bad to good – how a parasite became a mutualist


Just saw this very cool paper in PLoS Biology on Wolbachia that appear to have converted from parasites to mutualists. Wolbachia are among my favorite organisms. They are intracellular bacteria that have been found to infect a wide diversity of invertebrate species. In many cases, the Wolbachia have male specific detrimental effects (I like to call the WMDs – Wolbachia of male destruction). In other cases (e.g., in filiarial nematodes), Wolbachia appear to be beneficial.

I had heard about the work in the new paper from one of the authors Michael Turelli, who was one of the main people to convince me to move to Davis. In this study, the authors returned to examine a population of Drosophila simulans that Turelli had studied some 20 years ago. In the previous studies Turelli and colleagues had found a “classic case” of Wolbachia infection spreading in nature. When they returned to study the population and did a suite of experiments, they found that the Wolbachia had acquired fecundity increasing mutations, making them mutualistic.

Though they have not yet figured out what mutations occurred, it seems that a little genome sequencing might help them. Just a little selfish plug there, since I led the first project to sequence a Wolbachia genome and would love to do some more …

For more information, see Weeks AR, Turelli M, Harcombe WR, Reynolds KT, Hoffmann AA (2007) From Parasite to Mutualist: Rapid Evolution of Wolbachia in Natural Populations of Drosophila. PLoS Biol 5(5): e114 doi:10.1371/journal.pbio.0050114.

Weeks, A., Turelli, M., Harcombe, W., Reynolds, K., & Hoffmann, A. (2007). From Parasite to Mutualist: Rapid Evolution of Wolbachia in Natural Populations of Drosophila PLoS Biology, 5 (5) DOI: 10.1371/journal.pbio.0050114

Wu, M., Sun, L., Vamathevan, J., Riegler, M., Deboy, R., Brownlie, J., McGraw, E., Martin, W., Esser, C., Ahmadinejad, N., Wiegand, C., Madupu, R., Beanan, M., Brinkac, L., Daugherty, S., Durkin, A., Kolonay, J., Nelson, W., Mohamoud, Y., Lee, P., Berry, K., Young, M., Utterback, T., Weidman, J., Nierman, W., Paulsen, I., Nelson, K., Tettelin, H., O’Neill, S., & Eisen, J. (2004). Phylogenomics of the Reproductive Parasite Wolbachia pipientis wMel: A Streamlined Genome Overrun by Mobile Genetic Elements PLoS Biology, 2 (3) DOI: 10.1371/journal.pbio.0020069

Symbionts Effects on Host Can be Mediated by a single Point Mutation

ResearchBlogging.org

Just saw the news about an article in PLoS Biology by Nancy Moran and colleagues. In their paper, which studied bacterial symbionts of aphids, they show that mutations in the gene encoding a heat shock protien in the symbiont influence the heat tolerance of the aphid hosts. Inturn this means that these mutations influence aphid geographical range and ecology. It is a relly cool story (Nancy Moran seems to publish a cool story like this every other week — I feel lucky to have worked with her on one symbiont project which I have written about here).

To read more about the Moran work, go to the article, which anyone can read since it is in PLoS Biology. Or go to the press releases such as here.

Dunbar, H., Wilson, A., Ferguson, N., & Moran, N. (2007). Aphid Thermal Tolerance Is Governed by a Point Mutation in Bacterial Symbionts PLoS Biology, 5 (5) DOI: 10.1371/journal.pbio.0050096

Glassy winged sharpshooter symbionts

As in earlier posts — I am posting one of my Open Access publications here … this one is on genomics of symbionts of the glassy winged sharpshooter. The citation is

Citation: Wu D, Daugherty SC, Van Aken SE, Pai GH, Watkins KL, et al. (2006) Metabolic Complementarity and Genomics of the Dual Bacterial Symbiosis of Sharpshooters. PLoS Biol 4(6): e188 doi:10.1371/journal.pbio.0040188

Metabolic Complementarity and Genomics of the Dual Bacterial Symbiosis of Sharpshooters

Dongying Wu1, Sean C. Daugherty1, Susan E. Van Aken2, Grace H. Pai2, Kisha L. Watkins1, Hoda Khouri1, Luke J. Tallon1, Jennifer M. Zaborsky1, Helen E. Dunbar3, Phat L. Tran3, Nancy A. Moran3, Jonathan A. Eisen1*¤

1 The Institute for Genomic Research, Rockville, Maryland, United States of America, 2 J. Craig Venter Institute, Joint Technology Center, Rockville, Maryland, United States of America, 3 Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, Arizona, United States of America

Mutualistic intracellular symbiosis between bacteria and insects is a widespread phenomenon that has contributed to the global success of insects. The symbionts, by provisioning nutrients lacking from diets, allow various insects to occupy or dominate ecological niches that might otherwise be unavailable. One such insect is the glassy-winged sharpshooter (Homalodisca coagulata), which feeds on xylem fluid, a diet exceptionally poor in organic nutrients. Phylogenetic studies based on rRNA have shown two types of bacterial symbionts to be coevolving with sharpshooters: the gamma-proteobacterium Baumannia cicadellinicola and the Bacteroidetes species Sulcia muelleri. We report here the sequencing and analysis of the 686,192–base pair genome of B. cicadellinicola and approximately 150 kilobase pairs of the small genome of S. muelleri, both isolated from H. coagulata. Our study, which to our knowledge is the first genomic analysis of an obligate symbiosis involving multiple partners, suggests striking complementarity in the biosynthetic capabilities of the two symbionts: B. cicadellinicola devotes a substantial portion of its genome to the biosynthesis of vitamins and cofactors required by animals and lacks most amino acid biosynthetic pathways, whereas S. muelleri apparently produces most or all of the essential amino acids needed by its host. This finding, along with other results of our genome analysis, suggests the existence of metabolic codependency among the two unrelated endosymbionts and their insect host. This dual symbiosis provides a model case for studying correlated genome evolution and genome reduction involving multiple organisms in an intimate, obligate mutualistic relationship. In addition, our analysis provides insight for the first time into the differences in symbionts between insects (e.g., aphids) that feed on phloem versus those like H. coagulata that feed on xylem. Finally, the genomes of these two symbionts provide potential targets for controlling plant pathogens such as Xylella fastidiosa, a major agroeconomic problem, for which H. coagulata and other sharpshooters serve as vectors of transmission.

Funding. Funding was from National Science Foundation Biocomplexity grants 9978518 and 0313737.

Academic Editor: Julian Parkhill, The Sanger Institute, United Kingdom

Citation: Wu D, Daugherty SC, Van Aken SE, Pai GH, Watkins KL, et al. (2006) Metabolic Complementarity and Genomics of the Dual Bacterial Symbiosis of Sharpshooters. PLoS Biol 4(6): e188 doi:10.1371/journal.pbio.0040188

Received: October 21, 2005; Accepted: April 10, 2006; Published: June 6, 2006

Copyright: © 2006 Wu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abbreviations: CDS, protein-coding gene; LPS, lipopolysaccharide; pI, isoelectric point; SNP, single nucleotide polymorphism

* To whom correspondence should be addressed. E-mail: jaeisen@ucdavis.edu

¤ Current address: UC Davis Genome Center, Department of Medical Microbiology and Immunology and Section of Evolution and Ecology, University of California Davis, Davis, California, United States of America

Introduction

Through mutualistic symbioses with bacteria, eukaryotes have been able to acquire metabolic capabilities that in turn have allowed the utilization of otherwise unavailable ecological niches. Among the diverse examples of such symbioses, those involving bacteria that live inside the cells of their host are of great interest. These “endo”-symbioses played a central role in the early evolution of eukaryotes (e.g., the establishment of the mitochondria and chloroplasts) and in many more recent diversification events such as animals living at deep-sea vents, corals, blood-feeding flies, carpenter ants, and several clades of sap-feeding insects.

Insects that feed primarily or entirely on sap are a virtual breeding ground for symbioses because this liquid rarely contains sufficient quantities of the nutrients that animals are unable to make for themselves. For example, the sole diet of most aphids is sap from phloem which is the component of the plant vascular system normally used to transport sugars and other organic nutrients. Despite the presence of many nutrients, phloem usually has little, if any, of the “essential” amino acids that cannot be synthesized by animals. To compensate, aphids engage in an obligate symbiosis with bacteria in the genus Buchnera, which, in exchange for sugar and simple, nonessential amino acids, synthesize the needed essential amino acids for their hosts.

The exact details of aphid-Buchnera interactions have been difficult to determine because no Buchnera has been cultivated outside its host. This limitation has been circumvented to a large degree by sequencing and analysis of multiple Buchnera genomes [13], which have provided detailed insights into the biology, evolution, and ecology of these symbioses. For example, despite having undergone massive amounts of gene loss in the time after they diverged from free-living Gammaproteobacteria, the Buchnera encode many pathways for the synthesis of essential amino acids. A critical component of these genomic studies is that, in most aphids, Buchnera is the only symbiont [4]. This implies that when genome-based metabolic pathway reconstructions suggest that a particular Buchnera is unable to make all the essential nutrients for its host, either the reconstructions are wrong, or the host must be getting those nutrients from its diet. For example, although one of the Buchnera strains is predicted to not be able to incorporate inorganic sulfur for the production of cysteine and other compounds, sulfur-containing organic compounds are known to occur in the diet of its host aphid [2].

In many other sap-feeding insects, including some aphids, several heritable bacterial types are found often living in close proximity within specialized structures in the insect body (e.g., [59]). This is apparently the case for all insects that are strict xylem-sap feeders, which include cicadas, spittlebugs, and some leafhoppers [5]. Xylem is the component of the plant vascular system that is primarily used to transport water and salts from the roots to the rest of the plant. Xylem sap has the lowest nitrogen or carbon content of any plant component and contains few organic compounds [10]. Although the composition varies among plant species and developmental stages, xylem fluid is always nutrient-poor, containing mostly inorganic compounds and minerals with small amounts of amino acids and organic acids [1115]. As in phloem, the amino acids consist mainly of nonessential types such as glutamine, asparagine, and aspartic acid, with all essential ones absent or present in very low amounts.

Among xylem-feeders, sharpshooters (Insecta: Hemiptera: Cicadellidae: Cicadellinae) are a prominent group of about 2,000 species [10], many of which are major pests of agriculture due to their roles as vectors of plant pathogens. Sharpshooters are known to possess two bacterial symbionts. One, called Candidatus Baumannia cicadellinicola (hereafter Baumannia), resembles Buchnera in having small genome size and a biased nucleotide composition favoring adenine and thymine (A + T) and in belonging to the Enterobacteriales group in the Gammaproteobacteria [16]. The other symbiont, which was recently named Candidatus Sulcia muelleri (hereafter Sulcia), is in the Bacteroidetes phylum (formerly called the Cytophaga-Flexibacter-Bacteroides, or CFB, phylum) and is distributed widely in related insect hosts [9]. Both symbionts are vertically transmitted in eggs and are housed in a specialized bacteriome within developing nymphs and adults, and molecular phylogenetic studies show that both symbionts represent ancient associations dating to the origin of sharpshooters (Baumannia) or earlier (Sulcia) [5,9,16].

We sought to apply genome sequencing and analysis methods to the sharpshooter symbioses. For a host species, we selected the glassy-winged sharpshooter, Homalodisca coagulata. This pest species has a rapidly expanding geographic range and inflicts major crop damage as a vector for the bacterium Xylella fastidiosa, the agent of Pierce’s disease of grapes and other plant diseases [10]. Initially, we focused on the Baumannia symbiont with the idea that comparisons with the related Buchnera species would allow us to better identify differences that related to xylem versus phloem feeding. After completing the genome of this Baumannia, analysis revealed that many pathways that we expected to be present were missing. In contrast to the Buchnera-aphid symbioses, a second symbiont is present in sharpshooters, so we could not assume that the nutrients that would have been made by the missing pathways must be in the sharpshooter diet. Despite technical difficulties, we were able to obtain a significant portion of the genome of the Sulcia from the same wild-caught samples of H. coagulata.

Here we present the analysis of these two genomic datasets and the striking finding that the symbionts appear to work in concert, and possibly even share metabolites, to produce all of the nutrients needed by the host to survive on its xylem diet.

Results/Discussion

General Features of the Baumannia Genome and Predicted Genes

The genome of Baumannia consists of one circular chromosome of 686,192 base pairs (bp) with an average G + C content of 33.23% (Table 1). The genome size closely matches an earlier estimate from gel electrophoresis [16]. Baumannia has neither a strong GC skew pattern nor a dnaA homolog—two features commonly used to identify origins of replication in bacteria. A putative origin was identified and designated as position 1, based on a weak but clear transition in oligonucleotide skew.

thumbnail

Table 1.

General Features of the Genomes of Baumannia and Other Insect Endosymbionts

A total of 46 noncoding RNA genes were identified: six rRNAs (two sets of 16S, 5S, and 23S), one small RNA, and 39 tRNAs including at least one for each of the 20 amino acids. A total of 605 putative protein-coding genes (CDSs) were identified in the genome, and 89.9% of these can be assigned a putative biological function. An overview of the Baumannia genome and its encoded genes is illustrated in Figure 1, and features of these genes are summarized in Table S1. Only four of the CDSs lack detectable homologs in GenBank or other complete genomes and thus can be considered “orphan” genes.

thumbnail

Figure 1. Circular View of the Baumannia Genome

Circles correspond to the following features, starting with the outermost circle: (1) forward strand genes, (2) reverse strand genes, (3) χ2 deviation of local nucleotide composition from the genome average, (4) GC skew, (5) tRNAs (green lines), (6) rRNAs (blue lines); and (7) small RNAs (red lines). Color legend for CDSs and number of genes in each category are at the bottom.

Evolution of Baumannia and the Genomes of Intracellular Organisms

Genome sequences have been found to be very useful in providing for better resolution and accuracy in phylogenetic trees than is achieved using single genes such as rRNA genes [17]. Although there are many ways to build genome-based trees, one particularly powerful approach is to identify orthologous genes between species and to combine alignments of these genes into a single alignment. We built a tree for Baumannia and related species from 45 ribosomal proteins using this concatenation approach (Figure 2A). This tree supports the rRNA-based grouping of Baumannia with the insect endosymbionts of the genera Buchnera, Wigglesworthia (symbionts of tsetse flies), and Blochmania (symbionts of ants) [16]. However, the branching order is different in the protein tree with Baumannia being the deepest branching symbiont. As in prior genomic studies [18], the insect endosymbionts in the tree in Figure 2A are monophyletic (i.e., they share a common ancestor to the exclusion of all other species for which genomes are available). A possible close relationship of Baumannia to the other symbionts in the group is further supported by the presence of a substantial number of segments of conserved gene order (Figure 2B).

thumbnail

Figure 2. Genome-Based Phylogenetic Analysis of Baumannia

(A) Maximum-likelihood tree of gamma-proteobacterial endosymbionts. The tree was built from concatenated alignments of 45 ribosomal proteins using the PHYML program. The bootstrap value is based upon 1,000 replications.

(B) Gene order comparison of Baumannia and Blochmannia floridanus. The plot shows the locations of homologous proteins between the two genomes.

All of these insect endosymbionts, including Baumannia, exhibit many genome-level trends commonly found in intracellular organisms when compared to free-living relatives, including a smaller genome, lower G + C content, a higher average predicted isoelectric point for encoded proteins, and more rapidly evolving proteins (Table 1, Figure 2A). Of critical importance to understanding these trends is that they occur in all types of intracellular organisms (e.g., mutualists and pathogens) from across the tree of life (archaea, bacteria, and eukaryotes). Much research has focused on trying to understand the mechanisms underlying these trends for which there are two major hypotheses: the loss of DNA repair genes resulting in subsequent changes in mutation patterns or changes in population genetic parameters that lead to more genetic drift [19,20].

As a global explanation, the population genetic forces have more support (e.g., [2123]), but the issue is far from resolved. One reason for this lack of resolution is that it is usually difficult to reconstruct the early events in the evolution of intracellularity. This insect endosymbiont group has many advantages that have made it a model system for resolving these early events. The addition of the Baumannia genome further improves the utility of this group for reasons we detail below.

One limitation of studies of the evolution of intracellular organisms is that the evolutionary separation between free-living and intracellular species is usually very large. For example, although much can be learned about recent mitochondrial evolution by comparative analysis of mitochondrial genomes, it is not even known what subgroup of Alphaproteobacteria contains the closest free-living relative of these organelles. This is because the mitochondrial symbiosis originated billions of years ago. The insect endosymbionts lack this limitation both because their symbioses evolved relatively recently and because of the large diversity of genomes available for the Gammaproteobacteria. To make the most use of these benefits, it is imperative to have an accurate picture of the phylogeny of the symbionts. The addition of the Baumannia genome is useful in this regard because its proteins appear to be evolving more slowly (as indicated by shorter branch lengths in Figure 2A) than those in the other endosymbionts. Having one organism with relatively short branch lengths in this group makes it more likely that the monophyly of the insect endosymbionts in trees is a reflection of their true history and not an artifact of phylogenetic reconstruction known as long-branch attraction.

The branch-length finding is an example of how Baumannia can be considered as a “missing link” in that it is an intermediate in many ways between the other insect endosymbionts and free-living species. This is the case not only for branch length but also for phylogenetic position (it is the deepest branching species), isoelectric point (pI), and G + C content (Table 1). By filling in the gaps between the free-living and intracellular species, the Baumannia genome should allow better inferences of the early events in the evolution of intracellularity.

Baumannia is not intermediate in value between free-living species and other insect endosymbionts for all “intracellular” features. For example, its genome size is smaller than that of some of the other endosymbionts. This is an important finding since the absolute values for many other features are highly correlated, both in this group and in other symbiont groups [24]. An example of this is shown for pI and G + C content (Figure 3). Another way of looking at this is that the Baumannia genome has shrunk more than one might expect based on its other intracellular features. This decoupling of the rates of change of different features can be useful in understanding the patterns of evolution in intracellular species. For example, one explanation for the pattern in Baumannia is that although it has experienced more gene loss than some of the other insect endosymbionts, it has maintained the most complete set of DNA repair genes for the group (Table 2). This retention of repair functions may have slowed its rate of change in other features, such as sequence change. If true, this suggests that, although the general differences between intracellular and free-living species may be due to population genetic forces, the variation among intracellular species may be due in part to variation in DNA repair. Consistent with this is the finding that species with the longest branch lengths in the trees (Wigglesworthia and Blochmania, Figure 2A) are those that are missing the mismatch repair genes (Table 2).

thumbnail

Figure 3. Correlation between Genomic G + C Content and the Average pI of the Proteins of Endosymbiotic and Free-Living Gammaproteobacteria

Species shown are Buchnera aphidicola APS (BaAPS), Buchnera aphidicola BP (BaBp), Buchnera aphidicola SG (BaSg), Baumannia (Bc), Blochmannia floridanus (Bf), Blochmannia pennsylvanicus (Bp), E. coli K12 (Ec), Wigglesworthia glossindia (Wg), and Yersinia pestis KIM (Yp).

thumbnail

Table 2.

Homologs of Genes Known to Be Involved in DNA Repair and Recombination in the Complete Genomes of Insect Endosymbionts

The differential loss of repair genes among organisms that share many other genome properties allows the insect endosymbiont group to serve as a model for studying the long-term effects of loss of various repair processes. For example, the consequences for genome evolution of losing recA can be examined by comparing Baumannia and Wigglesworthia, which retain it, to Buchnera, which lack it. The same logic can be used to study the effects of the loss of the DNA replication initiation gene dnaA which is missing from Baumannia (see above), Wigglesworthia, and Blochmannia [18,25] but is present in the other insect endosymbionts. Although the species without recA may be able to survive with little or no recombination, those lacking dnaA must make use of alternative initiation pathways. Some alternatives such as pathways based on priA and recA [26] can be ruled out since at least one of these is missing from each of the species missing dnaA. The recBCD genes may play some role in initiation. This would explain why the recBCD genes are present in all insect endosymbionts (Table 2) including those missing recA, which is required for the “normal” role of recBCD in recombination.

Single Nucleotide Polymorphisms Are Abundant in the Baumannia Population

Genetic variation among individuals is both a complication of genome sequencing projects of uncultured species and a valuable source of information about microbial populations. For the Baumannia data, we used a stringent search protocol that may have missed some true polymorphisms but should have eliminated variation that was due to sequencing errors or cloning artifacts (see Materials and Methods). In total, we identified 104 single nucleotide polymorphisms (SNPs) and two insertion-deletion differences (indels) that fit these criteria. Details of the locations and types of polymorphisms are given in Table 3.

thumbnail

Table 3.

Categorizations of Polymorphisms Detected in the Assembled Baumannia Genome

Since our DNA was isolated from the symbionts of hundreds of hosts, one major question is whether the observed polymorphisms were between symbionts within one host or between hosts. We used polymerase chain reaction surveys of individual insects to address this question. Of the 40 insects for which sequences were obtained individually for two loci, 35 showed identity to the consensus sequence for the Baumannia genome and five possessed the alternative alleles that were present as minority bases at four sites (two per fragment). No polymorphism was detected within individual hosts. Thus, the polymorphisms that we identified are real, and they reflect differences between symbionts of different hosts.

Since the Baumannia can be treated as maternally inherited markers, the finding of significant levels of polymorphism between hosts suggests that the sampled population contains individuals from two separate origins. This is somewhat in conflict with theories suggesting a single introduction of a small number of individuals into California [10] but is consistent with results from recent mitochondrial analysis [27].

Sequence polymorphisms have been detected in genomic studies of other insect endosymbionts [3,28]. The most relevant one for comparison to Baumannia is a study of the ant endosymbiont Blochmannia pennsylvanicus, although we note that the criteria they used for detecting a polymorphism were somewhat less stringent than ours [28]. The percentage of the SNPs that are in coding regions is different in the two species (81% in Baumannia and 65% in B. pennsylvanicus), but this is in line with differences in gene-coding density (88% in Baumannia and 76% in B. pennsylvanicus). For both species, the percentage of SNPs in protein-coding genes that represent synonymous differences is higher than expected from random changes given the genomic base compositions (52% in Baumannia and 62% in B. pennsylvanicus). This indicates ongoing purifying selection in both genomes. The most significant difference between the species is the higher ratio of transitions to transversions in B. pennsylvanicus (2.9 versus 1.4 in Baumannia; Table 3). We propose that this is due to the absence of mismatch repair genes in B. pennsylvanicus (as discussed above), which in other species leads to an increase in transition mutations [29]. An absence of mismatch repair would also explain the higher incidence of indels in B. pennsylvanicus.

Metabolic Reconstructions Provide Insight into the Biology of Baumannia

Predictions of the metabolism of an organism from its genome sequence are critical to studies of uncultured organisms because of the difficulty of experimental studies. We have generated such a prediction for Baumannia (Figure 4). Although all such predictions should be viewed as hypotheses, not facts, they are greatly improved by having closely related species for which experimental studies are available. This is yet another advantage of working on the insect symbionts in the Gammaproteobacteria. For example, almost all Baumannia genes have clearcut orthologs in well-studied organisms such as Escherichia coli.

thumbnail

Figure 4. Predicted Metabolic Pathways in Baumannia and the Predicted Amino Acid Biosynthesis Pathways Encoded by the Partial Genome Sequence of Sulcia

Genes that are present are in red and the corresponding catalytic pathways are illustrated in solid black lines; the genes that are absent in the Baumannia genome and genes that have not been identified in the partial Sulcia genome are in gray, and the corresponding metabolic steps are illustrated in gray lines.

As expected, based on its small genome size, Baumannia has a relatively limited repertoire of synthetic capabilities. There are some important features of its predicted metabolism, and we discuss these in this and the next few sections of this paper, calling attention in particular to those of relevance to the host-symbiont interaction.

Baumannia is predicted to synthesize its own cell wall and plasma membrane, processes known to be lost in some intracellular species. It is, however, apparently unable to synthesize the lipopolysaccharide (LPS) commonly found in the outer membrane of other Gram-negative bacteria. The same is true for Buchnera species but not for Wigglesworthia and Blochmannia. The functional significance of this difference is unclear. On one hand, lipid A (the lipid component of the LPS) is generally highly toxic to animal cells; thus, LPS may be disadvantageous for endosymbionts and discarded during their evolution. Alternatively, the difference may reflect differences in the packaging of symbionts within the host bacteriocytes. Buchnera and Baumannia cells are surrounded by host-derived vesicles, while Wigglesworthia and Blochmannia directly contact the host cytoplasm.

The findings in regard to sugar metabolism are consistent with Baumannia acquiring sugars from its host and using them for energy metabolism. For import, a complete mannose phosphotransferase permease system is present including an Enzyme IIMan complex, the phosphotransferase system Enzyme I, and histidyl phosphorylatable protein PtsH. Imported sugars could then be fed into glycolysis. However, since the tricarboxylic acid cycle appears to be incomplete, apparently reducing power must come from other sources such as glycolysis itself, a pyruvate dehydrogenase complex, and an mqo type malate dehydrogenase. An intact electron transport chain consisting of NADH dehydrogenase I, cytochrome o oxidase, and ATP synthase is present.

The most striking aspects of the metabolism of Baumannia relates to what it apparently does and does not do in terms of the synthesis of essential nutrients missing from the hosts’ xylem diet.

Baumannia Is a Vitamin and Cofactor Machine

A large fraction of the Baumannia genome (83 genes, 13.7% of the total) encodes proteins predicted to have roles in pathways for the synthesis of a diverse set of vitamins, cofactors, prosthetic groups and related compounds (Figure 4, Table S1). These include thiamine (vitamin B1), riboflavin (vitamin B2), niacin (vitamin B3), pantothenic acid (vitamin B5), pyridoxine (vitamin B6), as well as biotin and folic acid. More detail on the pathways and the basis for the predictions is given below.

For the synthesis of riboflavin, folate, pyridoxal 5′-phosphate, and thiamine, complete pathways for de novo synthesis could be identified with Baumannia‘s ability to produce endogenously important precursors such as ribulose-5-phosphate, phosphoenolpyruvate, pyruvate, dihydroxyacetonephosphate, glyceraldehyde-3-phosphate, erythrose-4-phosphate, guanosine triphosphate, 5-aminoimidazole ribonucleotide, 5′-phosphoribosylglycinamide, and 5,10-methylene-tetrahydrofolate.

For some compounds, although homologs of enzymes carrying out key steps in other species could not be identified, candidates for alternatives are present suggesting the pathways are complete. For example, the step normally carried out by erythrose 4-phosphate dehydrogenase (Epd) in the pyridoxal 5′-phosphate pathway might be carried out by glyceraldehyde 3-phosphate dehydrogenase (GapA) as seen in some other species [30].

There are some compounds for which we could identify homologs of all known genes in biosynthetic pathways. However, some enzymes in these pathways are still unknown in any organism, and thus we could not identify them here. This is true for the pyrimidine phosphatase in the riboflavin pathway and the dihydroneopterin monophosphate dephosphorylase in the folic acid pathway. We believe it is likely that these pathways are complete in Baumannia and that, due to its ultracompact gene pool, Baumannia provides an ideal opportunity to identify the genes encoding the enzymes for these steps.

Perhaps most interesting are the pathways for which we could identify genes underlying many downstream steps but for which Baumannia would need to import some intermediates to feed those steps. For example, Baumannia encodes genes for the last three steps for siroheme synthesis, and the last step of heme O pathway, but candidate genes underlying the upstream steps could not be identified. Thus, Baumannia needs to import prophobilinogen and protoheme as substrates for these incomplete pathways. This pattern is particularly apparent in that Baumannia appears to be able to synthesize many cofactors from amino acids but is unable to synthesize the amino acid precursors. Examples of such pathways and the amino acid required include thiamin (tyrosine), biotin (alanine), pyridine nucleotides (aspartate), and folate and pyridoxal 5′-phosphate (glutamine and glutamate). This suggests that Baumannia must import these amino acids. The lack of amino acid biosynthesis pathways also makes it a necessity for Baumannia to import 2-ketovaline as a precursor for the synthesis of pantothenate and coenzyme A.

Due to the diversity of vitamin and cofactor synthesis pathways that are present, we conclude that Baumannia is providing its host with these compounds due to their low abundance in its diet. In this respect Baumannia is more similar to Wigglesworthia, the symbiont of tsetse flies, than to Buchnera.

Amino Acid Biosynthetic Pathways Are Generally Absent from Baumannia and Likely Are Found in Another Organism in the System

In contrast to what is seen for vitamin and cofactor synthesis, Baumannia is predicted to encode a very limited set of amino acid synthesis pathways. The few capabilities that are present include histidine biosynthesis, synthesis of methionine if external homoserine is provided, and the ability to make chorismate but not to use it as substrate for production of aromatic amino acids as in most bacterial species. Except for histidine, no complete pathways for the synthesis of any amino acids essential to the host are present.

The lack of amino acid synthesis pathways is apparently compensated by an ability to import amino acids from the environment using a general amino acid ABC transporter, an arginine/lysine ABC transporter, a lysine permease, and a proton/sodium-glutamate symport protein, although the gene for the latter is disrupted by one frameshift. The import of amino acids is apparently used not just for making proteins but also for energy metabolism. The latter is evident by the presence of the aspartate ammonia-lyase AspA, which could be used to convert l-aspartate to fumarate, which in turn can be fed into the tricarboxylic acid cycle.

The absence of essential amino acid synthesis pathways from Baumannia implies that both the host and Baumannia must obtain amino acids from some external source or sources. The sole diet of H. coagulata is xylem sap [10], in which essential amino acids are rare to absent; however, a substantial portion of the nitrogen in xylem occurs in the form of certain nonessential amino acids, including glutamine, aspartic acid, and asparagine (e.g., [11,14,31,32]). The essential amino acid synthesis pathways have not been found in any animal species studied to date, and nutritional studies in insects indicate that these compounds are required nutrients in insects as in mammals. Thus, the most plausible alternative is that another organism that is reliably present in the “ecosystem” of the host body is synthesizing the missing amino acids.

Analysis of Leftover Shotgun Sequence Reads Reveals the Presence of Amino Acid Synthesis Genes in Organisms Other than Baumannia

The most likely candidate for another organismal source of the amino acid synthesis pathways is Sulcia, the other coevolving symbiont found in bacteriomes mentioned above. Although we did not set out to sequence the Sulcia genome as part of this project, we realized we might have inadvertently acquired some of it since many sequence reads from the shotgun sequencing did not assemble with the Baumannia genome. These reads derived from cells of other organisms that were present in the tissue samples we used to isolate DNA for the Baumannia sequencing. An initial search of these sequence reads revealed the presence of homologs of genes with roles in the synthesis of essential amino acids. However, we could not conclude that these reads were from Sulcia, since there could have been cells of other organisms in the sample as well. To sort the extra reads into taxonomic bins, we adapted methods we have used to sort sequences from environmental shotgun sequencing projects (see Materials and Methods) and were able to assign non-Baumannia sequences to three main groups: host, Wolbachia related, and Sulcia related.

The finding of some Wolbachia in the sample was not surprising since rRNA surveys have shown that these alphaproteobacterial relatives of Rickettsia are found in many sharpshooters including H. coagulata. We note that we did not detect any sequences from the previously sequenced phytopathogen X. fastidiosa, which colonizes the surface of the foregut and is not present in the bacteriomes that we used for DNA isolation. In addition, although some of our sequences show high identity to sequences annotated as being from a phytoplasma, we believe this annotation is incorrect. The “phytoplasma” DNA was isolated from the saliva of the leafhopper Orosius albicinctus [33]. However, all the sequences in our sample that showed matches to sequences annotated as “phytoplasma”-like show phylogenetic relationships to the Bacteroidetes phylum. In addition, Sulcia is known to be a symbiont of species in the Deltocephalinae, the leafhopper subfamily containing O. albicinctus [9]. Thus, the putative “phytoplasma”-like sequences with matches in our sample are likely from the Sulcia symbiont of O. albicinctus. Why these sequences appeared in samples from salivary secretions is unclear.

Amino Acid Synthesis Pathways Are in Sulcia and Not Other Organisms in the Sample

Of the essential amino acid synthesis genes identified in the extra shotgun sequence reads, the vast majority (31 of 32) were assigned to the Sulcia bin. In contrast, only one gene (argB) was found in the Wolbachia bin and none were found in the host bin. We therefore sought to obtain as much sequence information as possible from the Sulcia symbionts in this system. First, we completed the sequence of all plasmid clones for which at least one read had been assigned to the Sulcia bin. In addition, we constructed a new library from tissue thought to contain more of the Sulcia symbiont than the library used for the initial sequencing. End-sequencing of this library identified some additional Sulcia-derived clones, and these, too, were completely sequenced. After conducting another round of assembly and assigning contigs and sequences to taxonomic bins, we were able to assign 146,384 bp of unique sequence to Sulcia. In these data, we identified 166 protein-coding genes. A phylogenetic analysis of a concatenated alignment of ribosomal proteins groups this protein set within the Bacteroidetes, thus supporting our assignment of these sequences to Sulcia (Figure 5).

thumbnail

Figure 5. Maximum-Likelihood Tree of Sulcia with Species in the Bacteroides and Chlorobi Phyla for which Complete Genomes Are Available

The tree was build using the PHYML program from the concatenated alignments of 34 ribosomal proteins. The bootstrap values are based upon 1,000 replications.

Although theoretically we could obtain a complete genome sequence of Sulcia by very deep sequencing of the samples we have obtained, this was not practical given limited funds. Nevertheless, analysis of the incomplete genome is quite revealing. First, among the 166 predicted proteins are 31 that underlie steps or whole pathways for the synthesis of amino acids essential for the host (Figure 4). These include the complete pathway of threonine biosynthesis and nearly complete pathways for the synthesis of leucine, valine, and isoleucine (the only gene not sampled is ilvE encoding the branched chain amino acid aminotransferase). In addition, multiple genes in the pathways for the synthesis of lysine, arginine, and tryptophan are present. We believe it is likely that these pathways are present and that the missing genes are in the unsequenced parts of the genome.

One question that remains is where Sulcia gets all of the nitrogen for these amino acids. One possibility is that it acquires and then converts nitrogenous organic compounds, particularly the nonessential amino acids known to be present in xylem (e.g., [14,32]). Alternatively, it is possible that Sulcia assimilates nitrogen from compounds such as ureides or ammonium, which are found in xylem (e.g., [14,32,34]). It has been proposed that X. fastidiosa, the plant pathogen vectored by H. coagulata, makes use of the ammonium in xylem as a nitrogen source [35]. Alternatively, Sulcia could garner inorganic nitrogen from the host, for which ammonium is a waste product [10,13]. Host waste is apparently is a source of nitrogen for Blattabacterium, close relatives of Sulcia that are symbionts of cockroaches [36]. Although some insect genomes encode enzymes that may allow for this (e.g., glutamine synthetase or glutamate synthase (e.g., [37]), it is not yet known whether these capabilities are present in sharpshooters. Whatever the source of its nitrogen, the genome analysis indicates that Sulcia apparently can make the amino acids required by the host.

The other abundant organism in our DNA was Wolbachia, an unlikely candidate as the source of these compounds. Wolbachia cannot be an obligate symbiont of sharpshooters because it infects only some individuals. Screening of individual H. coagulata indicates that some do not contain Wolbachia ([16], two of 40 insects were uninfected in our screens); and screening of individuals of the closely related species, Homalodisca literata (a synonym of H. lacerta), revealed no cases of Wolbachia infection. Also, although we have sampled only a fraction of the Wolbachia genome, the absence of amino acid synthesis pathways is consistent with the complete lack of essential amino acid biosynthesis in any of several sequenced Wolbachia genomes (two complete and many incomplete) [23,38,39].

We therefore conclude that Sulcia is most likely the sole provider of essential amino acids for H. coagulata. Thus, this member of the Bacteroidetes phylum appears to function in a similar way to Buchnera and Blochmannia species in the Gammaproteobacteria.

Sulcia and Baumannia Complement Each Other

We found very few genes in the partial Sulcia genome for vitamin or cofactor synthesis. Since the Sulcia genome appears to be quite small and we have apparently sampled a large fraction of it, we can speculate that few such genes are likely to be present. Thus, in the 146 kb of sequence assigned to Sulcia, we have already found many of the core housekeeping types of genes (e.g., 40 ribosomal proteins and ten tRNA synthetases (Figure 6, Table S2). A very small genome size is consistent with phylogenetic reconstructions indicating that Sulcia is an extremely old symbiont, originating in the Permian [9].

thumbnail

Figure 6. The Distribution into Functional Role Categories of the 166 Predicted Genes Encoded in the 146,384-bp Partial Sequence of the Sulcia Genome

Data are shown for all ORFs that encode proteins longer than 45 amino acids that have BLASTP matches with an E-value less than 10−3 to proteins in complete genomes. Different fragments of the same gene are counted as one gene in the chart.

The paucity of vitamin and cofactor synthesis pathways in Sulcia suggests the possibility that Sulcia and Baumannia play complementary, nonoverlapping roles in this symbiotic system. Not only do they appear to provide different resources for the host (Sulcia provides the amino acids and Baumannia the vitamins and cofactors) but, based on the current evidence, each does not provide the resources made by the other (Table 4). Indeed, the single essential amino acid biosynthetic pathway present in the Baumannia genome, that for histidine, is correspondingly the sole essential amino acid pathway with multiple steps for which no genes were detected in Sulcia. Thus, although Baumannia and the host apparently depend on Sulcia for the majority of essential amino acids, Sulcia and the host may depend on Baumannia for histidine. The complementarity between host and each symbiont extends to mutual dependence between the symbionts, which appear to depend on each other for these required compounds and for intermediates in other metabolic processes. For example, we predict that Sulcia can make homoserine, which, as discussed above, could be the substrate for methionine synthesis in Baumannia. In addition, the valine pathway in Sulcia could be the source of the 2-ketovaline for pantothenate and coenzyme A biosynthesis in Baumannia. Exchange of intermediates may be occurring for many aspects of metabolism. In the case of ubiquinone, a key component of the electron transport chain, Baumannia lacks genes encoding the needed biosynthetic enzymes and thus likely needs to import ubiquinone. The same appears to be true for menaquinone. Strikingly, even though only four of the 166 proteins in Sulcia are predicted to be involved in pathways of cofactor synthesis, two are for production of menaquinone and ubiquinone production, which are among the few cofactors whose synthesis is not carried out by Baumannia.

thumbnail

Table 4.

The Complementarity of Amino Acid Biosynthesis and Cofactor Biosynthesis Pathways between Baumannia and Sulcia

The coresidence of Sulcia and Baumannia, presented here from H. coagulata, is representative of a symbiotic pair that is distributed in most or all sharpshooters, a xylem-feeding insect group [9,16]. Thus, the possibility of metabolic complementarity that is suggested by the genome analyses reflects long coevolution of the three lineages represented by the insects and the two bacteria. The two symbionts occur in close proximity within the yellow portion of the host bacteriomes [16], and Baumannia cells often appear to adhere to the surface of the much larger Sulcia cells. This arrangement is illustrated in images from our in situ hybridizations for H. literata, a close relative of H. coagulata (Figure 7).

thumbnail

Figure 7. Baumannia and Sulcia Coinhabit the Bacteriomes of the Host Insects

Fluorescent in situ hybridizations were performed using oligonucleotide probes designed to hybridize selectively to the ribosomal RNA of Baumannia (green) and of Sulcia (red), respectively. Bacteriomes were obtained from Homalodisca literata (a very close relative of H. coagulata).

Conclusions

The glassy-winged sharpshooter, H. coagulata, feeds on xylem sap, which has very low levels of many nutrients required by insects and other animals [10]. Sequence analysis suggests the occurrence of an obligate symbiosis among three organisms: H. coagulata, the gamma-proteobacterial endosymbiont Baumannia, and the Bacteroidetes bacterial symbiont Sulcia. The two bacterial symbionts co-occur within the cytosol of sharpshooter bacteriocytes, sometimes residing within the same cells. The main function of Baumannia, as revealed by its genomic sequence, is to provide cofactors, especially water-soluble B-family vitamins, to the host. Partial sequences from Sulcia suggest that it provides essential amino acids to the host. The two endosymbionts appear to show functional complementarity and show little overlap in biosynthetic pathways, although full sequencing of the Sulcia genome is needed for a comprehensive view of the contributions of these two organisms. Our analysis shows the added insight possible from assigning sequences to organisms rather than treating environmental samples as a representative of a communal gene set.

Many questions remain regarding this fusion of separate lineages into a single metabolic system. For example, the different organisms must balance their contributions to the shared metabolism through coordinated growth and gene expression, and the mechanisms underlying this integration are not known. Also, these bacterial genomes have undergone major reduction in size while apparently maintaining their complementary capabilities, raising the question of how the steps in genome reduction have been coordinated. The sharpshooters and their obligate bacterial endosymbionts provide a simple model of genomic coevolution, a process that has likely been central in the evolution of most organisms living in stable associations.

Materials and Methods

Isolation of DNA for sequencing.

The material for sequencing was obtained from adults of H. coagulata collected in a lemon orchard in Riverside, California, in June 2001 and June 2004. The California population was introduced from southeastern United States, Texas, or Mexico within the past 20 years [10,27]. DNA was isolated by first dissecting out the red portion of the bacteriome, which contains mainly Baumannia [16]. Approximately 200 adults were dissected, in PA buffer, and kept on ice. Immediately following the dissection, the bacteriome samples were disrupted with a pestle and were passaged in PA buffer through a 20-μm filter and then through an 11-μm filter, on ice. The filtering was intended to remove nuclei of the host insect cells. DNA was isolated from the filtered material using standard methods [16]. For the second sample, adults were collected in 2004 from the same lemon orchard as before; in this case, we attempted to increase representation of the Sulcia genome by dissecting out the yellow portion of the bacteriome from approximately 200 adults and then processing as for the first sample.

Library construction and shotgun sequencing.

DNA libraries were constructed by shearing the genomic DNA through nebulization, cutting DNA of a particular size out of an agarose gel, and cloning it into the pHOS2 plasmid vector. Then 13,926 sequencing reads were generated from a 3- to 4-kb insert-sized library that was constructed using the first “red bacteriome” DNA sample. In addition, a large insert library (10- to 12-kb inserts) was constructed with DNA purified from the second “yellow bacteriome” DNA sample and 3,396 reads were generated from this library. In order to get more sequences to close the Baumannia genome and finish Sulcia clones, 2,986 sequencing reads were generated in the closure efforts.

Assembly and closure of the Baumannia genome.

The shotgun sequence data were assembled using the TIGR assembler [40], and the genome of Baumannia was closed using a combination of primer walking, multiplex PCR, and generation and sequencing of transposon-tagged libraries. Repeats were identified using RepeatFinder [41], and sequence and assembly of the repeats were confirmed using PCRs that spanned the repeat. The final assembly was checked such that every single base is covered by at least two clones and has been sequenced at least once in each direction. The average depth of coverage for the genome is 6.4. A putative origin of replication was identified by analysis of transitions in oligonucleotide skew [42].

Identification and sequence of fragments of the genome of Sulcia

Sequence reads from the shotgun sequencing data that did not map to the Baumannia genome were processed to sort them into candidate taxonomic groups (bins). First, they were assembled into contigs (although the vast majority of sequences did not assemble). Then each contig was analyzed to assign it to a putative bin using a combination of BLAST searches and phylogenetic trees. All sequences were searched with BLASTN and BLASTX against multiple sequence databases to identify top scoring matches. In addition, the BLASTX search results were used to identify possible proteins encoded in the sequences; these proteins were then used to build phylogenetic trees. The taxonomic identity of the nearest neighbor in these trees was extracted and stored. From these search results, sequences were assigned to taxonomic groupings of as low a taxonomic level as possible (e.g., if a protein grouped within a clade of sequences from insects, it was assigned to an insect bin). Examination of the results revealed that there were three major bins: insect, Wolbachia, and Bacteroidetes. There were also many sequences that were not readily assignable to one of these bins but could be assigned to higher level groups such as “Bacteria.” Based on rRNA studies and other work, we assumed that all sequences that were assigned to animals were likely from the host, and that all assigned to Bacteroidetes were likely from the Sulcia symbiont. Thus we refer to these bins as host and Sulcia, respectively.

Initial analysis indicated that there were some genes encoding proteins predicted to be involved in amino acid synthesis in the Sulcia bin. In order to get more data from this taxonomic group, we decided to finish sequencing any clones that mapped to this group and that were at the end of contigs. In order to reduce the probability of wasting funds sequencing clones from another organism, we developed more stringent criteria for selecting which of the initial Sulcia bin sequences to characterize further. In these criteria, at least one of the following had to be true: (1) part of the contig contained a match of greater than 99% identity to the previously sequenced 16S to 23S rDNA of Sulcia [16]; (2) BLASTP searches of translated sequences against all complete microbial genomes gave a best match (based on E-value) to a member of the Bacteroidetes phylum; (3) predicted proteins branched with genes from Bacteroidetes species in neighbor joining trees; and (4) the sequences were significantly AT biased. For all sequence reads that were assigned to Sulcia using these criteria, if they were at the end of a contig, the remainder of the clone was sequenced.

After this additional sequencing, all sequence reads (including the new reads) that did not map to the Baumannia genome were reassembled using the Celera Assembler. From this new assembly, a “final” list of contigs likely to be from Sulcia was identified using similar criteria as above: first, the fragment had to have ORFs that either had a best scoring BLAST hit to a sequence in the Bacteroidetes phylum or position next to a Bacteroidetes gene in neighbor-joining trees of the proteins identified by BLASTP. In addition, GC content had to either be below 40% or the fragment had to have greater than 99% match for at least 200 bases to the previously sequenced 16S rDNA of the Bacteroidetes endosymbiont of the H. coagulata. The low GC content criterion was applied to exclude contamination from free-living bacteria in our DNA sample.

From the new assemblies, contigs we also reassigned to the Wolbachia and host bins. To be considered to be from Wolbachia, the contig had to have not been assigned to Baumannia or Sulcia and had to have a top BLASTX hit to sequences from other Wolbachia. In total, 43,079 bp of unique sequence were assigned to Wolbachia. Another 120 kb worth of sequences and assemblies could not be assigned conclusively to Sulcia, Wolbachia, or Baumannia but had top BLAST hits to bacterial genes.

Genome annotation.

For the Baumannia genome, the GLIMMER program was used to identify putative CDSs [43]. Some putative CDSs were discarded if they had no significant sequence similarity to known genes and if they had significant overlaps with other CDSs with significant sequence similarity to known genes. Noncoding RNAs were identified as described previously [23]. Gene function annotation was based on results of BLASTP searches against Genpept and completed microbial genome and hidden Markov model searches of the PFAM and TIGRFAM databases [44,45]. We identified only four genes in the Baumannia genome that did not have BLASTP matches to any protein entries in Genpept or proteins from publicly available complete genomic sequences (using an E-value cutoff of 0.01). GC skew and nucleotide composition analysis were performed as described previously [23].

For the partial Sulcia genome, ORFs were identified using the EMBOSS package [46]. Only those predicted peptides that were larger than 45 amino acids in length and that had BLASTP hits against microbial genome databases at E-value cutoff of 0.001 were kept as potential genes. The functional annotation of the Sulcia genes is mostly based on the top BLASTP hits.

DNA polymorphisms in Baumannia

Polymorphism analysis was done on the results of the initial assemblies of the shotgun sequence data. Finished sequences were not used since these were based on part on targeted sequencing of select clones, which eliminates the random nature found in the shotgun data. SNPs and indels were identified using stringent criteria to identify regions with variation among sequence reads that were not likely due to sequencing errors.

A site was considered to have an SNP if (1) it had high sequence quality (≥40 PHRED score); (2) the assembly column in which it was found had more than 4-fold coverage; (3) it had differences among the reads at that position, and (4) the variable site was adjacent to at least three invariant positions on both sides. We used only positions that did not have variable flanking sites to prevent alignment errors from mistakenly causing us to score a site as polymorphic. SNPs in coding regions were characterized as synonymous (no amino acid change), conservative (common amino acid change), nonconservative (unusual amino acid chance), or nonsense (stop codon), with a BLOSUM80 matrix being used to distinguish conservative from nonconservative.

Alignment gaps were scored as INDELs only if (1) the column with the gap had at least 4-fold coverage; (2) the aligned column had at least two high-quality sequence reads (≥40 PHRED score), and (3) three consecutive sequence reads on both sides of the gap(s) were of high quality (≥40 PHRED score).

To determine whether the polymorphisms occurred within or between individual host insects, DNA was extracted from the bacteriomes of 40 individual H. coagulata. These individuals were from the same collection that was used for the genomic sequences and had been frozen at −80 °C at the time of collection. PCR primers were designed for two regions (554 bp and 725 bp) that contained SNPs. These regions were amplified, the reaction products cleaned with Qiagen (Valencia, California, United States) miniprep columns, and the products were sequenced directly in both directions at the University of Arizona Genomic Analysis and Technology Center using an ABI 3730 sequencing machine.

We also used these 40 individuals to determine whether Wolbachia, which was detected in our sequence dataset, was present in all insects in the population. This determination was made on the basis of diagnostic PCR based on two genes, 16S rRNA and wsp, with the Baumannia SNP loci described above used as controls for DNA quality. Individuals with products for both Wolbachia loci were scored as positive, and individuals lacking both were scored as negative. (No individuals yielded one product and not the other.)

Comparative genomics.

The predicted proteomes of Baumannia, Wigglesworthia, Blochmannia, and three strains of Buchnera were combined into one database. “All vs. all” BLASTP searches were performed for this database, and a Lek clustering algorithm was applied to cluster the peptides into gene families. An E-value cutoff of 1 × 10−4 for the BLASTP results and a Lek similarity cutoff of 0.6 were chosen for the gene family clustering [47]. All the genes were searched against PFAM and the TIGRFAM database by HMMER, as well as against the reference genomes of E. coli K12 and Yersinia pestis KIM by BLASTP. Gene families were curated and functional roles were assigned according to the HMM and BLASTP search results.

Whole genome alignments of Baumannia versus Wigglesworthia, Blochmannia, three strains of Buchnera, E. coli, and Yersinia pestis were performed. Genome alignments were built using the BLASTP-based Java program DAGCHAINER [48] with an E-value cutoff of 1 × 10−5.

Phylogenetic analysis.

A set of 45 ribosomal protein genes for which orthologs could be identified in Baumannia and other genomes of interest was selected. Each ortholog set was aligned using CLUSTALW, the alignments were concatenated, a maximum likelihood tree was built by PHYML, and 1,000 bootstrap replicates were performed [49]. The same approach was adapted for building the maximum likelihood tree from a set of 34 ribosomal protein genes for Sulcia and selected genomes of interest.

Pathway analysis.

The proteomes from Baumannia and Sulcia were searched against KEGG GENES/SSDB/KO [50] databases by BLASTP. Neighbor-joining trees were built by QUICKTREE [51], and EC numbers were assigned to the Baumannia proteins basing on the nearest neighbor in the phylogenetic trees. The list of the EC number present in the Baumannia genome was submitted to the KEGG Web site (http://www.genome.jp/kegg) to obtain all the potential pathways in the genome. Each pathway was examined and verified according to our genome annotations as well as the pathway descriptions in the EcoCyc database [52].

Fluorescent in situ hybridizations to visualize coresiding symbionts.

In order to obtain images of the symbionts and to verify the correspondence of 16S rDNA sequences to the organisms inhabiting bacteriomes, these structures were dissected from newly collected H. literata, a close relative of H. coagulata that occurs in Tucson, Arizona. (This procedure requires live material, and H. coagulata is a major pest that is not yet established in Arizona where this work was carried out.) Bacteriomes were disrupted in buffer, hybridized, and visualized as described in [9], except that mounts were made in antifading Vectashield medium (Vector Laboratories, Burlingame, California, United States), and the microscope and software used were Deltavision RT and SofWoRx V2.50 Suite V1.0 and Imaris V4.0 (Applied Precision, Issaquah, Washington, United States). The two oligonucleotide probes were specific to the homologous regions of the 16S rRNA and were labeled with different fluorescent dyes, enabling visualization of both symbionts within the same preparations.

Supporting Information

Table S1. Predicted Protein Coding Genes in the Baumannia Genome

Predicted functions and role categories are shown.

(659 KB DOC)

Table S2. Predicted Protein Coding Genes in the Sulcia Genome

Predicted functions and role categories are shown.

(206 KB DOC)

Accession Numbers

The genome sequence data have been submitted to multiple sequence databases. All sequence traces have been submitted to the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov) Trace Archive and are available at ftp://ftp.ncbi.nih.gov/pub/TraceDB/baumannia_cicadellinicola. The GenBank (http://www.ncbi.nlm.nih.gov/Genbank) closed, annotated genome accession number for Baumannia is CP000238 and annotated data accession number for Sulcia is AANL00000000. The mapping of the traces to the closed genome of Baumannia is in the NCBI Assembly Archive (http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cgi)with number pending.

The Institute for Genomic Research (TIGR) accession numbers available in GenBank accession number CP000238 are EnzymeIIMan complex (BCI_0449–0451), phosphotransferase system Enzyme I (BCI_0070), histidyl phosphorylatable protein PtsH (BCI_0069), mqo type malate dehydrogenase (BCI_0001), NADH dehydrogenase I (BCI_0369–0381), cytochrome o oxidase (BCI_0267–0269), ATP synthase (BCI_0140–0147), glyceraldehyde 3-phosphate dehydrogenase (GapA) (BCI_0443), general amino acid ABC transporter (BCI_0250, BCI_0207–0208), arginine/lysine ABC transporter (BCI_0323–0326), lysine permease (BCI_0393), proton/sodium-glutamate symport protein (BCI_0108), and aspartate ammonia-lyase AspA (BCI_0593),

Acknowledgments

We are grateful to Heather Costa for assistance with sharpshooter collecting in Riverside and to Colin Dale and Wendy Smith for help with collections in 2001. Howard Ochman gave advice on the DNA isolation. We would like to acknowledge the TIGR Bioinformatics and IT departments for general support, Claire Fraser-Liggett and Eric Eisenstadt for encouragement, and members of the Eisen research group, especially Martin Wu and Jonathan Badger, for providing bioinformatics tools.

Author contributions. DW, NAM, and JAE conceived and designed the experiments. SEVA, GHP, KLW, HK, LJT, JMZ, HED, PLT, NAM, and JAE performed the experiments. DW, SCD, NAM, and JAE analyzed the data. NAM and JAE contributed reagents/materials/analysis tools. DW, NAM, and JAE wrote the paper. DW and SCD participated in annotation. SEVA participated in library construction: small insert. GHP participated in library construction: large insert. KLW and HK participated in Baumannia closure. LJT participated in Sulcia closure. JMZ participated in closure. HED, PLT, and NAM participated in DNA isolation. PLT and NAM participated in fluorescent in situ hybridization microscopy.

Competing interests. The authors have declared that no competing interests exist.

  1. Shigenobu S, Watanabe H, Hattorl M, Sakaki Y, Ishikawa H (2000) Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407:81–86. Find this article online
  2. Tamas I, Klasson L, Canback B, Naslund AK, Eriksson AS, et al. (2002) 50 Million years of genomic stasis in endosymbiotic bacteria. Science 296:2376–2379. Find this article online
  3. van Ham RC, Kamerbeek J, Palacios C, Rausell C, Abascal F, et al. (2003) Reductive genome evolution in Buchnera aphidicola. Proc Natl Acad Sci U S A 100:581–586. Find this article online
  4. Russell JA, Latorre A, Sabater-Munoz B, Moya A, Moran NA (2003) Side-stepping secondary symbionts: Widespread horizontal transfer across and beyond the Aphidoidea. Mol Ecol 12:1061–1075. Find this article online
  5. Buchner P (1965) Endosymbiosis of animals with plant microorganisms New York: John Wiley. 909 p.
  6. Kaiser B (1980) Licht- und elecktronenmikroskopische untersuchung der Symbioten von Graphocephala coccinea Forstier (Homoptera: Jassidae). J Insect Morphol Embryol 9:79–88. Find this article online
  7. von Dohlen CD, Kohler S, Alsop ST, McManus WR (2001) Mealybug beta-proteobacterial endosymbionts contain gamma-proteobacterial symbionts. Nature 412:433–436. Find this article online
  8. Gomez-Valero LM, Soriano-Nvarro V, Perez-Brocal A, Heddi A, Moya JM, et al. (2004) Coexistence of Wolbachia with Buchnera aphidicola and a secondary symbiont in the aphid Cinara cedri. J Bacteriol 186:6626–6633. Find this article online
  9. Moran NA, Tran P, Gerardo NM (2005) Symbiosis and insect diversification: An ancient symbiont of sap-feeding insects from the bacterial phylum Bacteroidetes. Appl Environ Microbiol 71:8802–8810. Find this article online
  10. Redak RA, Purcell AH, Lopes JR, Blua MJ, Mizell RF, et al. (2004) The biology of xylem fluid-feeding insect vectors of Xylella fastidiosa and their relation to disease epidemiology. Annu Rev Entomol 49:243–270. Find this article online
  11. Andersen P, Brodbeck B, Mizell R (1989) Metabolism of amino acids, organic acids and sugars extracted from the xylem fluid of four host plants by adult Homalodisca coagulata. Entomol Exp Appl 50:149–59. Find this article online
  12. Anderson PC, Brodbeck BV, Mizell RF (1992) Feeding by the leafhopper, Homalodisca coagulata, in relation to xylem fluid chemistry and tension. J Insect Physiol 38:611–622. Find this article online
  13. Andersen PC, Brodbeck B, Mizell RF (1995) Diurnal variation in tension, osmolarity and the composition of nitrogen and carbon assimilates in xylem fluid of Prunus persica, Vitis hybrid and Prunus communis. J Am Hort Sci 120:600–604. Find this article online
  14. Malaguti D, Millard P, Wendler R, Hepburn A, Tagliavini M (2001) Translocation of amino acids in the xylem of apple (Malus domestica Borkh.) trees in spring as a consequence of both N remobilization and root uptake. J Exp Bot 52:1665–1671. Find this article online
  15. Schjoerring JK, Husted S, Mäck G, Mattsson M (2002) The regulation of ammonium translocation in plants. J Exp Bot 53:883–890. Find this article online
  16. Moran NA, Dale C, Dunbar H, Smith WA, Ochman H (2003) Intracellular symbionts of sharpshooters (Insecta: Hemiptera: Cicadellinae) from a distinct clade with a small genome. Environ Microbiol 5:116–126. Find this article online
  17. Lerat E, Daubin V, Moran NA (2003) From gene trees to organismal phylogeny in prokaryotes: The case of the gamma-proteobacteria. PLoS Biol 1:e9 DOI: 10.1371/journal.pbio.0030316. Find this article online
  18. Gil R, Silva FJ, Zientz E, Delmotte F, Gonzalez-Candelas F, et al. (2003) The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes. Proc Natl Acad Sci U S A 100:9388–9393. Find this article online
  19. Moran NA (1996) Accelerated evolution and Muller’s ratchet in endosymbiotic bacteria. Proc Natl Acad Sci U S A 93:2873–2878. Find this article online
  20. Itoh T, Martin W, Nei M (2002) Acceleration of genomic evolution caused by enhanced mutation rate in endocellular symbionts. Proc Natl Acad Sci 99:12944–12948. Find this article online
  21. Herbeck JT, Funk DJ, Degnan PH, Wernegreen JJ (2003) A conservative test of genetic drift in the endosymbiotic bacterium Buchnera: Slightly deleterious mutations in the chaperonin groEL. Genetics 165:1651–1660. Find this article online
  22. Rispe C, Delmotte F, van Ham RC, Moya A (2004) Mutational and selective pressures on codon and amino acid usage in Buchnera endosymbiotic bacteria of aphids. Genome Res 14:44–53. Find this article online
  23. Wu M, Sun LV, Vamathevan J, Riegler M, Deboy R, et al. (2004) Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: A streamlined genome overrun by mobile genetic elements. PLoS Biol 2:e69 DOI: 10.1371/journal.pbio.0020069. Find this article online
  24. Wernegreen JJ, Degnan PH, Lazarus AB, Palacios C, Bordenstein SR (2003) Genome evolution in an insect cell: Distinct features of an ant-bacterial partnership. Biol Bull 204:221–231. Find this article online
  25. Akman L, Yamashita A, Watanabe H, Oshima K, Shiba T, et al. (2002) Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia Nat Genet 32:402–407. Find this article online
  26. Asai T, Sommer S, Bailone A, Kogoma T (1993) Homologous recombination-dependent initiation of DNA replication from DNA damage-inducible origins in Escherichia coli. EMBO J 12:3287–3295. Find this article online
  27. Smith PT (2005) Mitochondrial DNA variation among populations of the glassy-winged sharpshooter, Homalodisca coagulata J Insect Sci 5:41. Find this article online
  28. Degnan PH, Lazarus AB, Wernegreen JJ (2005) Genome sequence of Blochmannia pennsylvanicus indicates parallel evolutionary trends among bacterial mutualists of insects. Genome Res 15:1023–1033. Find this article online
  29. Miller JH (1996) Spontaneous mutators in bacteria: Insights into pathways of mutagenesis and repair. Annu Rev Microbiol 50:625–643. Find this article online
  30. Yang Y, Zhao G, Man TK, Winkler ME (1998) Involvement of the gapA– and epd (gapB)-encoded dehydrogenases in pyridoxal 5′-phosphate coenzyme biosynthesis in Escherichia coli K-12. J Bacteriol 180:4294–4299. Find this article online
  31. Brodbeck B, Mizell RF, Andersen P (1990) Amino acids as determinants of host preference for the xylem-feeding leafhopper, Homalodisca coagulata Oecologia 83:338–345. Find this article online
  32. Brodbeck BV, Andersen PC, Mizell RF (1999) Effects of total dietary nitrogen form on the development of xylophagous leafhoppers. Arch Insect Biochem Physiol 42:37–50. Find this article online
  33. Melamed S, Tanne E, Ben-Haim R, Edelbaum O, Yogev D, et al. (2003) Identification and characterization of phytoplasmal genes, employing a novel method of isolating phytoplasmal genomic DNA. J Bacteriol 185:6513–6521. Find this article online
  34. Suárez MF, Avila C, Gallardo F, Cantón R, Garcia-Gutiérrez A, et al. (2002) Molecular and enzymatic analysis of ammonium assimilation in woody plants. J Exp Bot 53:891–904. Find this article online
  35. Simpson JG, Reinach FC, Arruda P, Abreu FA, Acencio M, et al. (2000) The genome sequence of the plant pathogen Xylella fastidiosa. Nature 406:151–157. Find this article online
  36. Wren HN, Cochran DG (1987) Xanthine dehydrogenase activity in the cockroach endosymbiont Blattabacterium cuenoti (Mercier 1906) Hollande and Favre 1931 and in the cockroach fat body. Comp Biochem Physiol 88:1023–1026 B. Find this article online
  37. Scaraffia PA, Isoe J, Murillo A, Wells MA (2005) Ammonia metabolism in Aedes aegypti. Insect Biochem Mol Biol 35:491–503. Find this article online
  38. Foster J, Ganatra M, Kamal I, Ware J, Makarova K, et al. (2005) The Wolbachia genome of Brugia malayi Endosymbiont evolution within a human pathogenic nematode. PLoS Biol 3:e121 DOI: 10.1371/journal.pbio.0030121. Find this article online
  39. Salzberg SL, Hotopp JC, Delcher AL, Pop M, Smith DR, et al. (2005) Serendipitous discovery of Wolbachia genomes in multiple Drosophila species. Genome Biol 6(3):R23. Find this article online
  40. Sutton GG, White O, Adams MD, Kerlavage AR (1995) TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Sci Technol 1:9–19. Find this article online
  41. Volfovsky N, Haas BJ, Salzberg SL (2001) A clustering method for repeat analysis in DNA sequences. Genome Biol 2:research0027.1–27.11. Find this article online
  42. Worning P, Jensen LJ, Hallin PF, Stærfeldt LJ, Ussery DW (2006) Origin of replication in circular prokaryotic chromosomes. Environ Microbiol 8:353–361. Find this article online
  43. Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26:544–548. Find this article online
  44. Bateman A, Coin L, Durbin R, Finn RD, Hollich V, et al. (2004) The Pfam protein families database. Nucleic Acids Res 32:D138–D141. Find this article online
  45. Haft DH, Selengut JD, White O (2003) The TIGRFAMs database of protein families. Nucleic Acids Res 31:371–373. Find this article online
  46. Rice P, Longden I, Bleasby A (2000) EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet 16:276–277. Find this article online
  47. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. (2001) The sequence of the human genome. Science 291:1304–1351. Find this article online
  48. Haas BJ, Delcher AL, Wortman JR, Salzberg SL (2004) DAGchainer: A tool for mining segmental genome duplications and synteny. Bioinformatics 20:3643–3646. Find this article online
  49. Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704. Find this article online
  50. Kanehisa M, Goto S (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 28:27–30. Find this article online
  51. Howe K, Bateman A, Durbin R (2002) QuickTree: Building huge neighbour-joining trees of protein sequences. Bioinformatics 18:1546–1547. Find this article online
  52. Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, et al. (2005) EcoCyc: A comprehensive database resource for Escherichia coli. Nucleic Acids Res 33:D334–D337. Find this article online

The power of open access III: posting another of my open access publications here (Wolbachia genome paper)

I am posting another of my Open Access papers here – this was one of my first OA papers – a paper reporting sequencing and analysis of the first genome of a Wolbachia strain. Citation is:

Wu M, Sun LV, Vamathevan J, Riegler M, Deboy R, et al. (2004) Phylogenomics of the Reproductive Parasite Wolbachia pipientis wMel: A Streamlined Genome Overrun by Mobile Genetic Elements. PLoS Biol 2(3): e69 doi:10.1371/journal.pbio.0020069

Phylogenomics of the Reproductive Parasite Wolbachia pipientis wMel: A Streamlined Genome Overrun by Mobile Genetic Elements

Martin Wu1, Ling V. Sun2, Jessica Vamathevan1, Markus Riegler3, Robert Deboy1, Jeremy C. Brownlie3, Elizabeth A. McGraw3, William Martin4, Christian Esser4, Nahal Ahmadinejad4, Christian Wiegand4, Ramana Madupu1, Maureen J. Beanan1, Lauren M. Brinkac1, Sean C. Daugherty1, A. Scott Durkin1, James F. Kolonay1, William C. Nelson1, Yasmin Mohamoud1, Perris Lee1, Kristi Berry1, M. Brook Young1, Teresa Utterback1, Janice Weidman1, William C. Nierman1, Ian T. Paulsen1, Karen E. Nelson1, Hervé Tettelin1, Scott L. O’Neill2,3, Jonathan A. Eisen1*

1 The Institute for Genomic Research, Rockville, Maryland, United States of America, 2 Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut, United States of America, 3 Department of Zoology and Entomology, School of Life Sciences, The University of Queensland, St Lucia, Queensland, Australia, 4 Institut für Botanik III, Heinrich-Heine Universität, Düsseldorf, Germany

The complete sequence of the 1,267,782 bp genome of Wolbachia pipientis wMel, an obligate intracellular bacteria of Drosophila melanogaster, has been determined. Wolbachia, which are found in a variety of invertebrate species, are of great interest due to their diverse interactions with different hosts, which range from many forms of reproductive parasitism to mutualistic symbioses. Analysis of the wMel genome, in particular phylogenomic comparisons with other intracellular bacteria, has revealed many insights into the biology and evolution of wMel and Wolbachia in general. For example, the wMel genome is unique among sequenced obligate intracellular species in both being highly streamlined and containing very high levels of repetitive DNA and mobile DNA elements. This observation, coupled with multiple evolutionary reconstructions, suggests that natural selection is somewhat inefficient in wMel, most likely owing to the occurrence of repeated population bottlenecks. Genome analysis predicts many metabolic differences with the closely related Rickettsia species, including the presence of intact glycolysis and purine synthesis, which may compensate for an inability to obtain ATP directly from its host, as Rickettsia can. Other discoveries include the apparent inability of wMel to synthesize lipopolysaccharide and the presence of the most genes encoding proteins with ankyrin repeat domains of any prokaryotic genome yet sequenced. Despite the ability of wMel to infect the germline of its host, we find no evidence for either recent lateral gene transfer between wMel and D. melanogaster or older transfers between Wolbachia and any host. Evolutionary analysis further supports the hypothesis that mitochondria share a common ancestor with the α-Proteobacteria, but shows little support for the grouping of mitochondria with species in the order Rickettsiales. With the availability of the complete genomes of both species and excellent genetic tools for the host, the wMel–D. melanogaster symbiosis is now an ideal system for studying the biology and evolution of Wolbachia infections.
Academic Editor: Nancy A. Moran, University of Arizona
Citation: Wu M, Sun LV, Vamathevan J, Riegler M, Deboy R, et al. (2004) Phylogenomics of the Reproductive Parasite Wolbachia pipientis wMel: A Streamlined Genome Overrun by Mobile Genetic Elements. PLoS Biol 2(3): e69 doi:10.1371/journal.pbio.0020069
Received: November 19, 2003; Accepted: January 6, 2004; Published: March 16, 2004
Copyright: © 2004 Wu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abbreviations: CDS, coding sequence; ENc, effective number of codons; IS, insertion sequence; LPS, lipopolysaccharide; RT, reverse transcription; TIGR, The Institute for Genomic Research
* To whom correspondence should be addressed. E-mail: jeisen@tigr.org

Introduction

Wolbachia are intracellular gram-negative bacteria that are found in association with a variety of invertebrate species, including insects, mites, spiders, terrestrial crustaceans, and nematodes. Wolbachia are transovarialy transmitted from females to their offspring and are extremely widespread, having been found to infect 20%–75% of invertebrate species sampled (Jeyaprakash and Hoy 2000; Werren and Windsor 2000). Wolbachia are members of the Rickettsiales order of the α-subdivision of the Proteobacteria phyla and belong to the Anaplasmataceae family, with members of the genera Anaplasma, Ehrlichia, Cowdria, and Neorickettsia (Dumler et al. 2001). Six major clades (A–F) of Wolbachia have been identified to date (Lo et al. 2002): A, B, E, and F have been reported from insects, arachnids, and crustaceans; C and D from filarial nematodes.

thumbnail

Figure 1. Circular Map of the Genome and Genome Features

Circles correspond to the following: (1) forward strand genes; (2) reverse strand genes, (3) in red, genes with likely orthologs in both R. conorii and R. prowazekii; in blue, genes with likely orthologs in R. prowazekii, but absent from R. conorii; in green, genes with likely orthologs in R. conorii but absent from R. prowazekii; in yellow, genes without orthologs in either Rickettsia (Table S3); (4) plot is of χ2 analysis of nucleotide composition; phage regions are in pink; (5) plot of GC skew (G–C)/(G+C); (6) repeats over 200 bp in length, colored by category; (7) in green, transfer RNAs; (8) in blue, ribosomal RNAs; in red, structural RNA.

Wolbachia–host interactions are complex and range from mutualistic to pathogenic, depending on the combination of host and Wolbachia involved. Most striking are the various forms of “reproductive parasitism” that serve to alter host reproduction in order to enhance the transmission of this maternally inherited agent. These include parthenogenesis (infected females reproducing in the absence of mating to produce infected female offspring), feminization (infected males being converted into functional phenotypic females), male-killing (infected male embryos being selectively killed), and cytoplasmic incompatibility (in its simplest form, the developmental arrest of offspring of uninfected females when mated to infected males) (O’Neill et al. 1997a).
Wolbachia have been hypothesized to play a role in host speciation through the reproductive isolation they generate in infected hosts (Werren 1998). They also provide an intriguing array of evolutionary solutions to the genetic conflict that arises from their uniparental inheritance. These solutions represent alternatives to classical mutualism and are often of more benefit to the symbiont than the host that is infected (Werren and O’Neill 1997). From an applied perspective, it has been proposed that Wolbachia could be utilized to either suppress pest insect populations or sweep desirable traits into pest populations (e.g., the inability to transmit disease-causing pathogens) (Sinkins and O’Neill 2000). Moreover, they may provide a new approach to the control of human and animal filariasis. Since the nematode worms that cause filariasis have an obligate symbiosis with mutualistic Wolbachia, treatment of filariasis with simple antibiotics that target Wolbachia has been shown to eliminate microfilaria production as well as ultimately killing the adult worm (Taylor et al. 2000; Taylor and Hoerauf 2001).
Despite their common occurrence and major effects on host biology, little is currently known about the molecular mechanisms that mediate the interactions between Wolbachia and their invertebrate hosts. This is partly due to the difficulty of working with an obligate intracellular organism that is difficult to culture and hard to obtain in quantity. Here we report the completion and analysis of the genome sequence of Wolbachia pipientis wMel, a strain from the A supergroup that naturally infects Drosophila melanogaster (Zhou et al. 1998).

thumbnail

Table 1. wMel Genome Features

Results/Discussion

Genome Properties

The wMel genome is determined to be a single circular molecule of 1,267,782 bp with a G+C content of 35.2%. This assembly is very similar to the genetic and physical map of the closely related strain wMelPop (Sun et al., 2003). The genome does not exhibit the GC skew pattern typical of some prokaryotic genomes (Figure 1) that have two major shifts, one near the origin and one near the terminus of replication. Therefore, identification of a putative origin of replication and the assignment of basepair 1 were based on the location of the dnaA gene. Major features of the genome and of the annotation are summarized in Table 1 and Figure 1.

Repetitive and Mobile DNA

The most striking feature of the wMel genome is the presence of very large amounts of repetitive DNA and DNA corresponding to mobile genetic elements, which is unique for an intracellular species. In total, 714 repeats of greater than 50 bp in length, which can be divided into 158 distinct families (Table S1), were identified. Most of the repeats are present in only two copies in the genome, although 39 are present in three or more copies, with the most abundant repeat being found in 89 copies. We focused our analysis on the 138 repeats of greater than 200 bp (Table 2). These were divided into 19 families based upon sequence similarity to each other. These repeats were found to make up 14.2 % of the wMel genome. Of these repeat families, 15 correspond to likely mobile elements, including seven types of insertion sequence (IS) elements, four likely retrotransposons, and four families without detectible similarity to known elements but with many hallmarks of mobile elements (flanked by inverted repeats, present in multiple copies) (Table 2). One of these new elements (repeat family 8) is present in 45 copies in the genome. It is likely that many of these elements are not able to autonomously transpose since many of the transposase genes are apparently inactivated by mutations or the insertion of other transposons (Table S2). However, some are apparently recently active since there are transposons inserted into at least nine genes (Table S2), and the copy number of some repeats appears to be variable between Wolbachia strains (M. Riegler et al., personal communication). Thus, many of these repetitive elements may be useful markers for strain discrimination. In addition, the mobile elements likely contribute to generating the diversity of phenotypically distinct Wolbachia strains (e.g., mod strains [McGraw et al. 2001]) by altering or disrupting gene function (Table S2).

thumbnail

Table 2. wMel DNA Repeats of Greater than 200 bp

Three prophage elements are present in the genome. One is a small pyocin-like element made up of nine genes (WD00565–WD00575). The other two are closely related to and exhibit extensive gene order conservation with the WO phage described from Wolbachia sp. wKue (Masui et al. 2001) (Figure 2). Thus, we have named them wMel WO-A and WO-B, based upon their location in the genome. wMel WO-B has undergone a major rearrangement and translocation, suggesting it is inactive. Phylogenetic analysis indicates that wMel WO-B is more closely related to the wKue WO than to wMel WO-A (Figure S1). Thus, wMel WO-A likely represents either a separate insertion event in the Wolbachia lineage or a duplication that occurred prior to the separation of the wMel and wKue lineages. Phylogenetic analysis also confirms the proposed mosaic nature of the WO phage (Masui et al. 2001), with one block being closely related to lambdoid phage and another to P2 phage (data not shown).

Genome Structure: Rearrangements, Duplications, and Deletions

The irregular pattern of GC skew in wMel is likely due in part to intragenomic rearrangements associated with the many DNA repeat elements. Comparison with a large contig from a Wolbachia species that infects Brugia malayi is consistent with this (Ware et al. 2002) (Figure 3). While only translocations are seen in this plot, genetic comparisons reveal that inversions also occur between strains (Sun et al., 2003), which is consistent with previous studies of prokaryotic genomes that have found that the most common large-scale rearrangements are inversions that are symmetric around the origin of DNA replication (Eisen et al. 2000). The occurrence of frequent rearrangement events during Wolbachia evolution is supported by the absence of any large-scale conserved gene order with Rickettsia genomes. The rearrangements in Wolbachia likely correspond with the introduction and massive expansion of the repeat element families that could serve as sites for intragenomic recombination, as has been shown to occur for some other bacterial species (Parkhill et al. 2003). The rearrangements in wMel may have fitness consequences since several classes of genes often found in clusters are generally scattered throughout the wMel genome (e.g., ABC transporter subunits, Sec secretion genes, rRNA genes, F-type ATPase genes).

thumbnail

Figure 2. Phage Alignments and Neighboring Genes

Conserved gene order between the WO phage in Wolbachia sp. wKue and prophage regions of wMel. Putative proteins in wKue (Masui et al. 2001) were searched using TBLASTN against the wMel genome. Matches with an E-value of less than 1e−15 are linked by connecting lines. CDSs are colored as follows: brown, phage structural or replication genes; light blue, conserved hypotheticals; red, hypotheticals; magenta, transposases or reverse transcriptases; blue, ankyrin repeat genes; light gray, radC; light green, paralogous genes; gold, others. The regions surrounding the phage are shown because they have some unusual features relative to the rest of the genome. For example, WO-A and WO-B are each flanked on one side by clusters of genes in two paralogous families that are distantly related to phage repressors. In each of these clusters, a homolog of the radC gene is found. A third radC homolog (WD1093) in the genome is also flanked by a member of one of these gene families (WD1095). While the connection between radC and the phage is unclear, the multiple copies of the radC gene and the members of these paralogous families may have contributed to the phage rearrangements described above.

Although the common ancestor of Wolbachia and Rickettsia likely already had a reduced, streamlined genome, wMel has lost additional genes since that time (Table S3). Many of these recent losses are of genes involved in cell envelope biogenesis in other species, including most of the machinery for producing lipopolysaccharide (LPS) components and the alanine racemase that supplies D-alanine for cell wall synthesis. In addition, some other genes that may have once been involved in this process are present in the genome, but defective (e.g., mannose-1-phosphate guanylyltransferase, which is split into two coding sequences [CDSs], WD1224 and WD1227, by an IS5 element) and are likely in the process of being eliminated. The loss of cell envelope biogenesis genes has also occurred during the evolution of the Buchnera endosymbionts of aphids (Shigenobu et al. 2000; Moran and Mira 2001). Thus, wMel and Buchnera have lost some of the same genes separately during their reductive evolution. Such convergence means that attempts to use gene content to infer evolutionary relatedness needs to be interpreted with caution. In addition, since Anaplasma and Ehrlichia also apparently lack genes for LPS production (Lin and Rikihisha 2003), it is likely that the common ancestor of Wolbachia, Ehrlichia, and Anaplasma was unable to synthesize LPS. Thus, the reports that Wolbachia-derived LPS-like compounds is involved in the immunopathology of filarial nematode disease in mammals (Taylor 2002) either indicate that these Wolbachia have acquired genes for LPS synthesis or that the reported LPS-like compounds are not homologous to LPS.

thumbnail

Figure 3. Alignment of wMel with a 60 kbp Region of the Wolbachia from B. malayi

The figure shows BLASTN matches (green) and whole-proteome alignments (red) that were generated using the “promer” option of the MUMmer software (Delcher et al. 1999). The B. malayi region is from a BAC clone (Ware et al. 2002). Note the regions of alignment broken up by many rearrangements and the presence of repetitive sequences at the regions of the breaks.

Despite evident genome reduction in wMel and in contrast to most small-genomed intracellular species, gene duplication appears to have continued, as over 50 gene families have apparently expanded in the wMel lineage relative to that of all other species (Table S4). Many of the pairs of duplicated genes are encoded next to each other in the genome, suggesting that they arose by tandem duplication events and may simply reflect transient duplications in evolution (deletion is common when there are tandem arrays of genes). Many others are components of mobile genetic elements, indicating that these elements have expanded significantly after entering the Wolbachia evolutionary lineage. Other duplications that could contribute to the unique biological properties of wMel include that of the mismatch repair gene mutL (see below) and that of many hypothetical and conserved hypothetical proteins.
One duplication of particular interest is that of wsp, which is a standard gene for strain identification and phylogenetic reconstruction in Wolbachia (Zhou et al. 1998). In addition to the previously described wsp (WD0159), wMel encodes two wsp paralogs (WD0009 and WD0489), which we designate as wspB and wspC, respectively. While these paralogs are highly divergent from wsp (protein identities of 19.7% and 23.5%, respectively) and do not amplify using the standard wsp PCR primers (Braig et al. 1998; Zhou et al. 1998), their presence could lead to some confusion in classification and identification of Wolbachia strains. This has apparently occurred in one study of Wolbachia strain wKueYO, for which the reported wsp gene (gbAB045235) is actually an ortholog of wspB (99.8% sequence identity and located at the end of the virB operon [Masui et al. 2000]) and not an ortholog of the wsp gene. Considering that the wsp gene has been extremely informative for discriminating between strains of Wolbachia, we designed PCR primers to the wMel wspB gene to amplify and then sequence the orthologs from the related wRi and wAlbB Wolbachia strains from Drosophila simulans and Aedes albopictus, respectively, as well as the Wolbachia strain that infects the filarial nematode Dirofilaria immitis to determine the potential utility of this locus for strain discrimination. A comparison of genetic distances between the wsp and wspB genes for these different taxa indicates that overall the wspB gene appears to be evolving at a faster rate than wsp and, as such, may be a useful additional marker for discriminating between closely related Wolbachia strains (Table S5).

Inefficiency of Selection in wMel

The fraction of the genome that is repetitive DNA and the fraction that corresponds to mobile genetic elements are among the highest for any prokaryotic genome. This is particularly striking compared to the genomes of other obligate intracellular species such as Buchnera, Rickettsia, Chlamydia, and Wigglesworthia, that all have very low levels of repetitive DNA and mobile elements. The recently sequenced genomes of the intracellular pathogen Coxiella burnetti (Seshadri et al. 2003) has both a streamlined genome and moderate amounts of repetitive DNA, although much less than wMel. The paucity of repetitive DNA in these and other intracellular species is thought to be due to a combination of lack of exposure to other species, thereby limiting introduction of mobile elements, and genome streamlining (Mira et al. 2001; Moran and Mira 2001; Frank et al. 2002). We examined the wMel genome to try to understand the origin of the repetitive and mobile DNA and to explain why such repetitive/mobile DNA is present in wMel, but not other streamlined intracellular species.
We propose that the mobile DNA in wMel was acquired some time after the separation of the Wolbachia and Rickettsia lineages but before the radiation of the Wolbachia group. The acquisition of these elements after the separation of the Wolbachia and Rickettsia lineages is suggested by the fact that most do not have any obvious homologous sequences in the genomes of other α-Proteobacteria, including the closely related Rickettsia spp. Additional evidence for some acqui-sition of foreign DNA after the Wolbachia–Rickettsia split comes from phylogenetic analysis of those genes present in wMel, but not in the two sequenced rickettsial genomes (see Table S3; unpublished data). The acquisition prior to the radiation of Wolbachia is suggested by two lines of evidence. First, many of the elements are found in the genome of the distantly related Wolbachia of the nematode B. malayi (see Figure 3; unpublished data). In addition, genome analysis reveals that these elements do not have significantly anomalous nucleotide composition or codon usage compared to the rest of the genome. In fact, there are only four regions of the genome with significantly anomalous composition, comprising in total only approximately 17 kbp of DNA (Table 3). The lack of anomalous composition suggests either that any foreign DNA in wMel was acquired long enough ago to allow it to “ameliorate” and become compositionally similar to endogenous Wolbachia DNA (Lawrence and Ochman 1997, 1998) or that any foreign DNA that is present was acquired from organisms with similar composition to endogenous wMel genes. Owing to their potential effects on genome evolution (insertional mutagenesis, catalyzing genome rearrangements), we propose that the acquisition and maintenance of these repetitive and mobile elements by wMel have played a key role in shaping the evolution of Wolbachia.

thumbnail

Table 3. Regions of Anomalous Nucleotide Composition in the wMel Genome

It is likely that much of the mobile/repetitive DNA was introduced via phage, given that three prophage elements are present; experimental studies have shown active phage in some Wolbachia (Masui et al. 2001) and Wolbachia superinfections occur in many hosts (e.g., Jamnongluk et al. 2002), which would allow phage to move between strains. Whatever the mechanism of introduction, the persistence of the repetitive elements in wMel in the face of apparently strong pressures for streamlining is intriguing. One expla-nation is that wMel may be getting a steady infusion of mobile elements from other Wolbachia strains to counteract the elimination of elements by selection for genome streamlining. This would explain the absence of anomalous nucleotide composition of the elements. However, we believe that a major contributing factor to the presence of all the repetitive/mobile DNA in wMel is that wMel and possibly Wolbachia in general have general inefficiency of natural selection relative to other species. This inefficiency would limit the ability to eliminate repetitive DNA. A general inefficiency of natural selection (especially purifying selection) has been suggested previously for intracellular bacteria, based in part on observations that these bacteria have higher evolutionary rates than free-living bacteria (e.g., Moran 1996). We also find a higher evolutionary rate for wMel than that of the closely related intracellular Rickettsia, which themselves have higher rates than free-living α-Proteobacteria (Figure 4). Additionally, codon bias in wMel appears to be driven more by mutation or drift than selection (Figure S2), as has been reported for Buchnera species and was suggested to be due to inefficient purifying selection (Wernegreen and Moran 1999). Such inefficiencies of natural selection are generally due to an increase in the relative contribution of genetic drift and mutation as compared to natural selection (Eiglmeier et al. 2001; Lawrence 2001; Parkhill et al. 2001). Below we discuss different possible explanations for the inefficiency of selection in wMel, especially in comparison to other intracellular bacteria.

thumbnail

Figure 4. Long Evolutionary Branches in wMel

Maximum-likelihood phylogenetic tree constructed on concatenated protein sequences of 285 orthologs shared among wMel, R. prowazekii, R. conorii, C. crescentus, and E. coli. The location of the most recent common ancestor of the α-Proteobacteria (Caulobacter, Rickettsia, Wolbachia) is defined by the outgroup E. coli. The unit of branch length is the number of changes per amino acid. Overall, the amino acid substitution rate in the wMel lineage is about 63% higher than that of C. crescentus, a free-living α-Proteobacteria. wMel has evolved at a slightly higher rate than the Rickettssia spp., close relatives that are also obligate intracellular bacteria that have undergone accelerated evolution themselves. This higher rate is likely in part to be due to an increase in the rate of slightly deleterious mutations, although we have not ruled out the possibility of G+C content effects on the branch lengths.

Low rates of recombination, such as occur in centromeres and the human Y chromosome, can lead to inefficient selection because of the linkage among genes. This has been suggested to be occurring in Buchnera species because these species do not encode homologs of RecA, which is the key protein in homologous recombination in most species (Shigenobu et al. 2000). The absence of recombination in Buchnera is supported by the lack of genome rearrangements in their recent evolution (Tamas et al. 2002). Additionally, there is apparently little or no gene flow into Buchnera strains. In contrast, wMel encodes the necessary machinery for recombination, including RecA (Table S6), and has experienced both extensive intragenomic homologous recombination and introduction of foreign DNA. Therefore, the unusual genome features of wMel are unlikely to be due to low levels of recombination.
Another possible explanation for inefficient selection is high mutation rates. It has been suggested that the higher evolutionary rates in intracellular bacteria are the result of high mutation rates that are in turn due to the loss of genes for DNA repair processes (e.g., Itoh et al. 2002). This is likely not the case in wMel since its genome encodes proteins corresponding to a broad suite of DNA repair pathways including mismatch repair, nucleotide excision repair, base excision repair, and homologous recombination (Table S6). The only noteworthy DNA repair gene absent from wMel and present in the more slowly evolving Rickettsia is mfd, which is involved in targeting DNA repair to the transcribed strand of actively transcribing genes in other species (Selby et al. 1991). However, this absence is unlikely to contribute significantly to the increased evolutionary rate in wMel, since defects in mfd do not lead to large increases in mutation rates in other species (Witkin 1994). The presence of mismatch repair genes (homologs of mutS and mutL) in wMel is particularly relevant since this pathway is one of the key steps in regulating mutation rates in other species. In fact, wMel is the first bacterial species to be found with two mutL homologs. Overall, examination of the predicted DNA repair capabilities of bacteria (Eisen and Hanawalt 1999) suggests that the connection between evolutionary rates in intracellular species and the loss of DNA repair processes is spurious. While many intracellular species have lost DNA repair genes in their recent evolution, different species have lost different genes and some, such as wMel and Buchnera spp., have kept the genes that likely regulate mutation rates. In addition, some free-living species without high evolutionary rates have lost some of the same pathways lost in intracellular species, while many free-living species have lost key pathways resulting in high mutation rates (e.g., Helicobacter pylori has apparently lost mismatch repair [Eisen 1997, Eisen 1998b; Bjorkholm et al. 2001]). Given that intracellular species tend to have small genomes and have lost genes from every type of biological process, it is not surprising that many of them have lost DNA repair genes as well.
We believe that the most likely explanations for the inefficiency of selection in wMel involve population-size related factors, such as genetic drift and the occurrence of population bottlenecks. Such factors have also been shown to likely explain the high evolutionary rates in other intracellular species (Moran 1996; Moran and Mira 2001; van Ham et al. 2003). Wolbachia likely experience frequent population bottlenecks both during transovarial transmission (Boyle et al. 1993) and during cytoplasmic incompatibility mediated sweeps through host populations. The extent of these bottlenecks may be greater than in other intracellular bacteria, which would explain why wMel has both more repetitive and mobile DNA than other such species and a higher evolutionary rate than even the related Rickettsia spp. Additional genome sequences from other Wolbachia will reveal whether this is a feature of all Wolbachia or only certain strains.

Mitochondrial Evolution

There is a general consensus in the evolutionary biology literature that the mitochondria evolved from bacteria in the α-subgroup of the Proteobacteria phyla (e.g., Lang et al. 1999). Analysis of complete mitochondrial and bacterial genomes has very strongly supported this hypothesis (Andersson et al. 1998, 2003; Muller and Martin 1999; Ogata et al. 2001). However, the exact position of the mitochondria within the α-Proteobacteria is still debated. Many studies have placed them in or near the Rickettsiales order (Viale and Arakaki 1994; Gupta 1995; Sicheritz-Ponten et al. 1998; Lang et al. 1999; Bazinet and Rollins 2003). Some studies have further suggested that mitochondria are a sister taxa to the Rickettsia genus within the Rickettsiaceae family and thus more closely related to Rickettsia spp. than to species in the Anaplasmataceae family such as Wolbachia (Karlin and Brocchieri 2000; Emelyanov 2001a, 2001b, 2003a, 2003b).
In our analysis of complete genomes, including that of wMel, the first non-Rickettsia member of the Rickettsiales order to have its genome completed, we find support for a grouping of Wolbachia and Rickettsia to the exclusion of the mitochondria, but not for placing the mitochondria within the Rickettsiales order (Figure 5A and 5B; Table S7; Table S8). Specifically, phylogenetic trees of a concatenated alignment of 32 proteins show strong support with all methods (see Table S7) for common branching of: (i) mitochondria, (ii) Rickettsia with Wolbachia, (iii) the free-living α-Proteobacteria, and (iv) mitochondria within α-Proteobacteria. Since amino acid content bias was very severe in these datasets, protein LogDet analyses, which can correct for the bias, were also performed. In LogDet analyses of the concatenated protein alignment, both including and excluding highly biased positions, mitochondria usually branched basal to the Wolbachia–Rickettsia clade, but never specifically with Rickettsia (see Table S7). In addition, in phylogenetic studies of individual genes, there was no consistent phylogenetic position of mitochondrial proteins with any particular species or group within the α-Proteobacteria (see Table S8), although support for a specific branch uniting the two Rickettsia species with Wolbachia was quite strong. Eight of the proteins from mitochondrial genomes (YejW, SecY, Rps8, Rps2, Rps10, RpoA, Rpl15, Rpl32) do not even branch within the α-Proteobacteria, although these genes almost certainly were encoded in the ancestral mitochondrial genome (Lang et al. 1997).
This analysis of mitochondrial and α-Proteobacterial genes reinforces the view that ancient protein phylogenies are inherently prone to error, most likely because current models of phylogenetic inference do not accurately reflect the true evolutionary processes underlying the differences observed in contemporary amino acid sequences (Penny et al. 2001). These conflicting results regarding the precise position of mitochondria within the α-Proteobacteria can be seen in the high amount of networking in the Neighbor-Net graph of the analyses of the concatenated alignment shown in Figure 5. An important complication in studies of mitochondrial evolution lies in identifying “α-Proteobacterial” genes for comparison (Martin 1999). For example, in our analyses, proteins from Magnetococcus branched with other α-Proteobacterial homologs in only 17 of the 49 proteins studied, and in five cases they assumed a position basal to α-, β-, and γ-Proteobacterial homologs.

Host–Symbiont Gene Transfers

Many genes that were once encoded in mitochondrial genomes have been transferred into the host nuclear genomes. Searching for such genes has been complicated by the fact that many of the transfer events happened early in eukaryotic evolution and that there are frequently extreme amino acid and nucleotide composition biases in mitochondrial genomes (see above). We used the wMel genome to search for additional possible mitochondrial-derived genes in eukaryotic nuclear genomes. Specifically, we constructed phylogenetic trees for wMel genes that are not in either Rickettsia genomes. Five new eukaryotic genes of possible mitochondrial origin were identified: three genes involved in de novo nucleotide biosynthesis (purD, purM, pyrD) and two conserved hypothetical proteins (WD1005, WD0724). The α-Proteobacterial origin of these genes suggests that at least some of the genes of the de novo nucleotide synthesis pathway in eukaryotes might have been laterally acquired from bacteria via the mitochondria. The presence of such genes in other Proteobacteria suggests that their absence from Rickettsia is due to gene loss (Gray et al. 2001). This finding supports the need for additional α-Proteobacterial genomes to identify mitochondrion-derived genes in eukaryotes.

thumbnail

Figure 5. Mitochondrial Evolution Using Concatenated Alignments

Networks of protein LogDet distances for an alignment of 32 proteins constructed with Neighbor-Net (Bryant and Moulton 2003). The scale bar indicates 0.1 substitutions per site. Enlargements at lower right show the component of shared similarity between mitochondrial-encoded proteins and (i) their homologs from intracellular endosymbionts (red) as well as (ii) their homologs from free-living α-Proteobacteria (blue). (A) Result using 6,776 gap-free sites per genome (heavily biased in amino acid composition). (B) Result using 3,100 sites after exclusion of highly variable positions (data not biased in amino acid composition at p = 0.95). All data and alignments are available upon request. Results of phylogenetic analyses are summa-rized in Table S7. Since amino acid content bias was very severe in these datasets, protein LogDet analyses were also preformed. In neighbor-joining, parsimony, and maximum-likelihood trees generated from alignments both including and excluding highly biased positions (6,776 and 3,100 gap-free amino acid sites per genome, respectively), mitochondria usually branched basal to the Wolbachia–Rickettsia clade, but never specifically with Rickettsia (Table S7).

While organelle to nuclear gene transfers are generally accepted, there is a great deal of controversy over whether other gene transfers have occurred from bacteria into animals. In particular, claims of transfer from bacteria into the human genome (Lander et al. 2001) were later shown to be false (Roelofs and Van Haastert 2001; Salzberg et al. 2001; Stanhope et al. 2001). Wolbachia are excellent candidates for such transfer events since they live inside the germ cells, which would allow lateral transfers to the host to be transmitted to subsequent host generations. Consistent with this, a recent study has shown some evidence for the presence of Wolbachia-like genes in a beetle genome (Kondo et al. 2002). The symbiosis between wMel and D. melanogaster provides an ideal case to search for such transfers since we have the complete genomes of both the host and symbiont. Using BLASTN searches and MUMmer alignments, we did not find any examples of highly similar stretches of DNA shared between the two species. In addition, protein-level searches and phylogenetic trees did not identify any specific relationships between wMel and D. melanogaster for any genes. Thus, at least for this host–symbiont association, we do not find any likely cases of recent gene exchange, with genes being maintained in both host and symbiont. In addition, in our phylogenetic analyses, we did not find any examples of wMel proteins branching specifically with proteins from any invertebrate to the exclusion of other eukaryotes. Therefore, at least for the genes in wMel, we do not find evidence for transfer of Wolbachia genes into any invertebrate genome.

Metabolism and Transport

wMel is predicted to have very limited capabilities for membrane transport, for substrate utilization, and for the biosynthesis of metabolic intermediates (Figure S3), similar to what has been seen in other intracellular symbionts and pathogens (Paulsen et al. 2000). Almost all of the identifiable uptake systems for organic nutrients in wMel are for amino acids, including predicted transporters for proline, asparate/glutamate, and alanine. This pattern of transporters, coupled with the presence of pathways for the metabolism of the amino acids cysteine, glutamate, glutamine, proline, serine, and threonine, suggests that wMel may obtain much of its energy from amino acids. These amino acids could also serve as material for the production of other amino acids. In contrast, carbohydrate metabolism in wMel appears to be limited. The only pathways that appear to be complete are the tricarboxylic acid cycle, the nonoxidative pentose phosphate pathway, and glycolysis, starting with fructose-1,6-biphosphate. The limited carbohydrate metabolism is consistent with the presence of only one sugar phosphate transporter. wMel can also apparently transport a range of inorganic ions, although two of these systems, for potassium uptake and sodium ion/proton exchange, are frameshifted. In the latter case, two other sodium ion/proton exchangers may be able to compensate for this defect.
Many of the predicted metabolic properties of wMel, such as the focus on amino acid transport and the presence of limited carbohydrate metabolism, are similar to those found in Rickettsia. A major difference with the Rickettsia spp. is the absence of the ADP–ATP exchanger protein in wMel. In Rickettsia this protein is used to import ATP from the host, thus allowing these species to be direct energy scavengers (Andersson et al. 1998). This likely explains the presence of glycolysis in wMel but not Rickettsia. An inability to obtain ATP from its host also helps explain the presence of pathways for the synthesis of the purines AMP, IMP, XMP, and GMP in wMel but not Rickettsia. Other pathways present in wMel but not Rickettsia include threonine degradation (described above), riboflavin biosynthesis, pyrimidine metabolism (i.e., from PRPP to UMP), and chelated iron uptake (using a single ABC transporter). The two Rickettsia species have a relatively large complement of predicted transporters for osmoprotectants, such as proline and glycine betaine, whereas wMel possesses only two of these systems.

Regulatory Responses

The wMel genome is predicted to encode few proteins for regulatory responses. Three genes encoding two-component system subunits are present: two sensor histidine kinases (WD1216 and WD1284) and one response regulator (WD0221). Only six strong candidates for transcription regulators were identified: a homolog of arginine repressors (WD0453), two members of the TenA family of transcription activator proteins (WD0139 and WD0140), a homolog of ctrA, a transcription regulator for two component systems in other α-Proteobacteria (WD0732), and two σ factors (RpoH/WD1064 and RpoD/WD1298). There are also seven members of one paralogous family of proteins that are distantly related to phage repressors (see above), although if they have any role in transcription, it is likely only for phage genes. Such a limited repertoire of regulatory systems has also been reported in other endosymbionts and has been explained by the apparent highly predictable and stable environment in which these species live (Andersson et al. 1998; Read et al. 2000; Shigenobu et al. 2000; Moran and Mira 2001; Akman et al. 2002; Seshadri et al. 2003).

Host–Symbiont Interactions

The mechanisms by which Wolbachia infect host cells and by which they cause the diverse phenotypic effects on host reproduction and fitness are poorly understood, and the wMel genome helps identify potential contributing factors. A complete Type IV secretion system, portions of which have been reported in earlier studies, is present. The complete genome sequence shows that in addition to the five vir genes previously described from Wolbachia wKueYO (Masui et al. 2001), an additional four are present in wMel. Of the nine wMel vir ORFs, eight are arranged into two separate operons. Similar to the single operon identified in wTai and wKueYO, the wMel virB8, virB9, virB10, virB11, and virD4 CDSs are adjacent to wspB, forming a 7 kb operon (WD0004–WD0009). The second operon contains virB3, virB4, and virB6 as well as four additional non-vir CDSs, including three putative membrane-spanning proteins, that form part of a 15.7 kb operon (WD0859–WD0853). Examination of the Rickettsia conorii genome shows a similar orga-nization (Figure 6A). The observed conserved gene order for these genes between these two genomes suggests that the putative membrane-spanning proteins could form a novel and, possibly, integral part of a functioning Type IV secretion system within these bacteria. Moreover, reverse transcription (RT)-PCRs have confirmed that wspB and WD0853–WD0856 are each expressed as part of the two vir operons and further indicate that these additional encoded proteins are novel components of the Wolbachia Type IV secretion system (Figure 6B).
In addition to the two major vir clusters, a paralog of virB8 (WD0817) is also present in the wMel genome. WD0818 is quite divergent from virB8 and, as such, does not appear to have resulted from a recent gene duplication event. RT-PCR experiments have failed to show expression of this CDS in wMel-infected Drosophila (data not shown). PCR primers were designed to all CDSs of the wMel Type IV secretion system and used to successfully amplify orthologs from the divergent Wolbachia strains wRi and wAlbB (data not shown). We were able to detect orthologs to all of the wMel Type IV secretion system components as well as most of the adjacent non-vir CDSs, suggesting that this system is conserved across a range of A- and B-group Wolbachia. An increasing body of evidence has highlighted the importance of Type IV secretion systems for the successful infection, invasion, and persistence of intracellular bacteria within their hosts (Christie 2001; Sexton and Vogel 2002). It is likely that the Type IV system in Wolbachia plays a role in the establishment and maintenance of infection and possibly in the generation of reproductive phenotypes.
Genes involved in pathogenicity in bacteria have been found to be frequently associated with regions of anomalous nucleotide composition, possibly owing to transfer from other species or insertion into the genome from plasmids or phage. In the four such regions in wMel (see above; see Table 3), some additional candidates for pathogenicity-related activities are present including a putative penicillin-binding protein (WD0719), genes predicted to be involved in cell wall synthesis (WD0095–WD0098, including D-alanine-D-alanine ligase, a putative FtsQ, and D-alanyl-D-alanine carboxy peptidase) and a multidrug resistance protein (WD0099). In addition, we have identified a cluster of genes in one of the phage regions that may also have some role in host–symbiont interactions. This cluster (WD0611–WD0621) is embedded within the WO-B phage region of the genome (see Figure 2) and contains many genes that encode proteins with putative roles in the synthesis and degradation of surface polysaccharides, including a UDP-glucose 6-dehydrogenase (WD0620). Since this cluster appears to be normal in terms of phylogeny relative to other genes in the genome (i.e., the genes in this region have normal wMel nucleotide composition and branch in phylogenetic trees with genes from other α-Proteobacteria), it is not likely to have been acquired from other species. However, it is possible that these genes can be transferred among Wolbachia strains via the phage, which in turn could lead to some variation in host–symbiont interactions between Wolbachia strains.

thumbnail

Figure 6. Genomic Organization and expression of Type IV Secretion Operons in wMel

(A) Organization of the nine vir-like CDSs (white arrows) and five adjacent CDSs that encode for either putative membrane-spanning proteins (black arrows) or non-vir CDSs (gray arrows) of wMel, R. conorii, and A. tumefaciens. Solid horizontal lines denote RT experiments that have confirmed that adjacent CDSs are expressed as part of a polycistronic transcript. Results of these RT-PCR experiments are presented in (B). Lane 1, virB3virB4; lane 2, RT control; lane 3, virB6-WD0856; lane 4, RT control; lane 5, WD0856-WD0855; lane 6, RT control; lane 7, WD0854-WD0853; lane 8, RT control; lane 9, virB8virB9; lane 10, RT control; lane 11, virB9virB11; lane 12, RT control; lane 13, virB11virD4; lane 14, RT control; lane 15, virD4wspB; lane 16, RT control; lane 17, virB4virB6; lane 18, RT control; lane 19, WD0855-WD0854; lane 20, RT control. Only PCRs that contain reverse transcriptase amplified the desired products. PCR primer sequences are listed in Table S9.

Of particular interest for host-interaction functions are the large number of genes that encode proteins that contain ankyrin repeats (Table 4). Ankyrin repeats, a tandem motif of around 33 amino acids, are found mainly in eukaryotic proteins, where they are known to mediate protein–protein interactions (Caturegli et al. 2000). While they have been found in bacteria before, they are usually present in only a few copies per species. wMel has 23 ankyrin repeat-containing genes, the most currently described for a prokaryote, with C. burnetti being next with 13. This is particularly striking given wMel’s relatively small genome size. The functions of the ankyrin repeat-containing proteins in wMel are difficult to predict since most have no sequence similarity outside the ankyrin domains to any proteins of known function. Many lines of evidence suggest that the wMel ankyrin domain proteins are involved in regulating host cell-cycle or cell division or interacting with the host cytoskeleton: (i) many ankyrin-containing proteins in eukaryotes are thought to be involved in linking membrane proteins to the cytoskeleton (Hryniewicz-Jankowska et al. 2002); (ii) an ankyrin-repeat protein of Ehrlichia phagocytophila binds condensed chromatin of host cells and may be involved in host cell-cycle regulation (Caturegli et al. 2000); (iii) some of the proteins that modify the activity of cell-cycle-regulating proteins in D. melanogaster contain ankyrin repeats (Elfring et al. 1997); and (iv) the Wolbachia strain that infects the wasp Nasonia vitripennis induces cytoplasmic incompatibility, likely by interacting with these same cell-cycle proteins (Tram and Sullivan 2002). Of the ankyrin-containing proteins in wMel, those worth exploring in more detail include the several that are predicted to be surface targeted or secreted (Table 4) and thus could be targeted to the host nucleus. It is also possible that some of the other ankyrin-containing proteins are secreted via the Type IV secretion system in a targeting signal independent pathway. We call particular attention to three of the ankyrin-containing proteins (WD0285, WD0636, and WD0637), which are among the very few genes, other than those encoding components of the translation apparatus, that have significantly biased codon usage relative to what is expected based on GC content, suggesting they may be highly expressed.

Conclusions

Analysis of the wMel genome reveals that it is unique among sequenced genomes of intracellular organisms in that it is both streamlined and massively infected with mobile genetic elements. The persistence of these elements in the genome for apparently long periods of time suggests that wMel is inefficient at getting rid of them, likely a result of experiencing severe population bottlenecks during every cycle of transovarial transmission as well as during sweeps through host populations. Integration of evolutionary reconstructions and genome analysis (phylogenomics) has provided insights into the biology of Wolbachia, helped identify genes that likely play roles in the unusual effects Wolbachia have on their host, and revealed many new details about the evolution of Wolbachia and mitochondria. Perhaps most importantly, future studies of Wolbachia will benefit both from this genome sequence and from the ability to study host–symbiont interactions in a host (D. melanogaster) well-suited for experimental studies.

Materials and Methods

Purification/source of DNA wMel DNA was obtained from D. melanogaster yw67c23 flies that naturally carry the wMel infection. wMel was purified from young adult flies on pulsed-field gels as described previously (Sun et al. 2001). Plugs were digested with the restriction enzyme AscI (GG^CGCGCC), which cuts the bacterial chromosome twice (Sun et al. 2001), aiding in the entry of the DNA into agarose gels. After electrophoresis, the resulting two bands were recovered from the gel and stored in 0.5 M EDTA (pH 8.0). DNA was extracted from the gel slices by first washing in TE (Tris–HCl and EDTA) buffer six times for 30 min each to dilute EDTA followed by two 1-h washes in β-agarase buffer (New England Biolabs, Beverly, Massachusetts, United States). Buffer was then removed and the blocks melted at 70°C for 7 min. The molten agarose was cooled to 40°C and then incubated in β-agarase (1 U/100 μl of molten agarose) for 1 h. The digest was cooled to 4°C for 1 h and then centrifuged at 4,100 × gmax for 30 min at 4°C to remove undigested agarose. The supernatant was concentrated on a Centricon YM-100 microconcentrator (Millipore, Bedford, Massachusetts, United States) after prerinsing with 70% ethanol followed by TE buffer and, after concentration, rinsed with TE. The retentate was incubated with proteinase K at 56°C for 2 h and then stored at 4°C. wMel DNA for gap closure was prepared from approximately 1,000 Drosophila adults using the Holmes–Bonner urea/phenol:chloroform protocol (Holmes and Bonner 1973) to prepare total fly DNA.
Library construction/sequencing/closure The complete genome sequence was determined using the whole-genome shotgun method (Venter et al. 1996). For the random shotgun-sequencing phase, libraries of average size 1.5–2.0 kb and 4.0–8.0 kb were used. After assembly using the TIGR Assembler (Sutton et al. 1995), there were 78 contigs greater than 5000 bp, 186 contigs greater than 3000 bp, and 373 contigs greater than 1500 bp. This number of contigs was unusually high for a 1.27 Mb genome. An initial screen using BLASTN searches against the nonredundant database in GenBank and the Berkeley Drosophila Genome Project site (http://www.fruitfly.org/blast/) showed that 3,912 of the 10,642 contigs were likely contaminants from the Drosophila genome. To aid in closure, the assemblies were rerun with all sequences of likely host origin excluded. Closure, which was made very difficult by the presence of a large amount of repetitive DNA (see below), was done using a mix of primer walking, generation, and sequencing of transposon-tagged libraries of large insert clones and multiplex PCR (Tettelin et al. 1999). The final sequence showed little evidence for polymorphism within the population of Wolbachia DNA. In addition, to obtain sequence across the AscI-cut sites, PCR was performed on undigested DNA. It is important to point out that the reason significant host contamination does not significantly affect symbiont genome assembly is that most of the Drosophila contigs were small due to the approximately 100-fold difference in genome sizes between host (approximately 180 Mb) and wMel (1.2 Mb).
Since it has been suggested that Wolbachia and their hosts may undergo lateral gene transfer events (Kondo et al. 2002), genome assemblies were rerun using all of the shotgun and closure reads without excluding any sequences that appeared to be of host origin. Only five assemblies were found to match both the D. melanogaster genome and the wMel assembly. Primers were designed to match these assemblies and PCR attempted from total DNA of wMel infected D. melanogaster. In each case, PCR was unsuccessful, and we therefore presume that these assemblies are the result of chimeric cloning artifacts. The complete sequence has been given GenBank accession ID AE017196 and is available at http://www.tigr.org/tdb.
Repeats Repeats were identified using RepeatFinder (Volfovsky et al. 2001), which makes use of the REPuter algorithm (Kurtz and Schleiermacher 1999) to find maximal-length repeats. Some manual curation and BLASTN and BLASTX searches were used to divide repeat families into different classes.
Annotation Identification of putative protein-encoding genes and annotation of the genome was done as described previously (Eisen et al. 2002). An initial set of ORFs likely to encode proteins (CDS) was identified with GLIMMER (Salzberg et al. 1998). Putative proteins encoded by the CDS were examined to identify frameshifts or premature stop codons compared to other species. The sequence traces for each were reexamined and, for some, new sequences were generated. Those for which the frameshift or premature stops were of high quality were annotated as “authentic” mutations. Functional assignment, identification of membrane-spanning domains, determination of paralogous gene families, and identification of regions of unusual nucleotide composition were performed as described previously (Tettelin et al. 2001). Phylogenomic analysis (Eisen 1998a; Eisen and Fraser 2003) was used to aid in functional predictions. Alignments and phylogenetic trees were generated as described (Salzberg et al. 2001).
Comparative genomics All putative wMel proteins were searched using BLASTP against the predicted proteomes of published complete organismal genomes and a set of complete plastid, mitochondrial, plasmid, and viral genomes. The results of these searches were used (i) to analyze the phylogenetic profile (Pellegrini et al. 1999; Eisen and Wu 2002), (ii) to identify putative lineage-specific duplications (those proteins with a top E-value score to another protein from wMel), and (iii) to determine the presence of homologs in different species. Orthologs between the wMel genome and that of the two Rickettsia species were identified by requiring mutual best-hit relationships among all possible pairwise BLASTP comparisons, with some manual correction. Those genes present in both Rickettsia genomes as well as other bacterial species, but not wMel, were considered to have been lost in the wMel branch (see Table S3). Genes present in only one or two of the three species were considered candidates for gene loss or lateral transfer and were also used to identify possible biological differences between these species (see Table S3). For the wMel genes not in the Rickettsia genomes, proteins were searched with BLASTP against the TIGR NRAA database. Protein sequences of their homologs were aligned with CLUSTALW and manually curated. Neighbor-joining trees were constructed using the PHYLIP package.
Phylogenetic analysis of mitochondrial proteins For phylogenetic analysis, the set of all 38 proteins encoded in both the Marchantia polymorpha and Reclinomonas americana (Lang et al. 1997) mitochondrial genomes were collected. Acanthamoeba castellanii was excluded due to high divergence and extremely long evolutionary branches. Six genes were excluded from further analysis because they were too poorly conserved for alignment and phylogenetic analysis (nad7, rps10, sdh3, sdh4, tatC, and yejV), leaving 32 genes for investigation: atp6, atp9, atpA, cob, cox1, cox2, cox3, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad9, rpl16, rpl2, rpl5, rpl6, rps1, rps11, rps12, rps13, rps14, rps19, rps2, rps3, rps4, rps7, rps8, yejR, and yejU. Using FASTA with the mitochondrial proteins as a query, homologs were identified from the genomes of seven α-Proteobacteria: two intracellular symbionts (W. pipientis wMel and Rickettsia prowazekii) and five free-living forms (Sinorhozobium loti, Agrobacterium tumefaciens, Brucella melitensis, Mesorhizobium loti, and Rhodopseudomonas sp.). Escherichia coli and Neisseria meningitidis were used as outgroups. Caulobacter crescentus was excluded from analysis because homologs of some of the 32 genes were not found in the current annotation. In the event that more than one homolog was identified per genome, the one with the greatest sequence identity to the mitochondrial query was retrieved. Proteins were aligned using CLUSTALW (Thompson et al. 1994) and concatenated. To reduce the influence of poorly aligned regions, all sites that contained a gap at any position were excluded from analysis, leaving 6,776 positions per genome for analysis. The data contained extreme amino acid bias: all sequences failed the χ2 test at p = 0.95 for deviation from amino acid frequency distribution assumed under either the JTT or mtREV24 models as determined with PUZZLE (Strimmer and von Haeseler 1996). When the data were iteratively purged of highly variable sites using the method described (Hansmann and Martin 2000), amino acid composition gradually came into better agreement with acid frequency distribution assumed by the model. The longest dataset in which all sequences passed the χ2 test at p = 0.95 consisted of the 3,100 least polymorphic sites. PROTML (Adachi and Hasegawa 1996) analyses of the 3,100-site data using the JTT model detected mitochondria as sisters of the five free-living α-Proteobacteria with low (72%) support, whereas PUZZLE, using the same data, detected mitochondria as sisters of the two intracellular symbionts, also with low (85%) support. This suggested the presence of conflicting signal in the less-biased subset of the data. Therefore, protein log determinants (LogDet) were used to infer distances from the 6,776-site data, since the method can correct for amino acid bias (Lockhart et al. 1994), and Neighbor-Net (Bryant and Moulton 2003) was used to display the resulting matrix, because it can detect and display conflicting signal. The result (see Figure 5A) shows both signals. In no analysis was a sister relationship between Rickettsia and mitochondria detected.
For analyses of individual genes, the 63 proteins encoded in the Reclinomonas mitochondrial genome were compared with FASTA to the proteins from 49 sequenced eubacterial genomes, which included the α-Proteobacteria shown in Figure 5, R. conorii, and Magnetococcus MC1, one of the more divergent α-Proteobacteria. Of those proteins, 50 had sufficiently well-conserved homologs to perform phylogenetic analyses. Homologs were aligned and subjected to phylogenetic analysis with PROTML (Adachi and Hasegawa 1996).
Analysis of wspB sequences To compare wspB sequences from different Wolbachia strains, PCR was done on total DNA extracted from the following sources: wRi was obtained from infected adult D. simulans, Riverside strain; wAlbB was obtained from the infected Aa23 cell line (O’Neill et al. 1997b), and D. immitis Wolbachia was extracted from adult worm tissue. DNA extraction and PCR were done as previously described (Zhou et al. 1998) with wspB-specific primers (wspB-F, 5′-TTTGCAAGTGAAACAGAAGG and wspB-R, 5′-GCTTTGCTGGCAAAATGG). PCR products were cloned into pGem-T vector (Promega, Madison, Wisconsin, United States) as previously described (Zhou et al. 1998) and sequenced (Genbank accession numbers AJ580921–AJ508923). These sequences were compared to previously sequenced wsp genes for the same Wolbachia strains (Genbank accession numbers AF020070, AF020059, and AJ252062). The four partial wsp sequences were aligned using CLUSTALV (Higgins et al. 1992) based on the amino acid translation of each gene and similarly with the wspB sequences. Genetic distances were calculated using the Kimura 2 parameter method and are reported in Table S5.
Type IV secretion system To determine whether the vir-like CDSs, as well as adjacent ORFs, were actively expressed within wMel as two polycistronic operons, RT-PCR was used. Total RNA was isolated from infected D. melanogaster yw67c23 adults using Trizol reagent (Invitrogen, Carlsbad, California, United States) and cDNA synthesized using SuperScript III RT (Invitrogen) using primers wspBR, WD0817R, WD0853R, and WD0852R. RNA isolation and RT were done according to manufacturer’s protocols, with the exception that suggested initial incubation of RNA template and primers at 65°C for 5 min and final heat denaturation of RT-enzyme at 70°C for 15 min were not done. PCR was done using rTaq (Takara, Kyoto, Japan), and several primer sets were used to amplify regions spanning adjacent CDSs for most of the two operons. For operon virB3-WD0853, the following primers were used: (virB3virB4)F, (virB3virB4)R, (virB6-WD0856)F, (virB6-WD0856)R, (WD0856-WD0855)F, (WD0856-WD0855)R, (WD0854-WD0853)F, (WD0854-WD0853)R. For operon virB8wspB, the following primers were used: (virB8virB9)F, (virB8virB9)R, (virB9virB11)F, (virB9virB11)R, (virB11virD4)F, (virB11virD4)R, (virD4wspB)F, and (virD4wspB)R. The coexpression of virB4 and virB6, as well as WD0855 and WD0854, was confirmed within the putative virB3-WD0853 operon using nested PCR with the following primers: (virB4virB6)F1, (virB4virB6)R1, (virB4virB6)F2, (virB4virB6)R2, (WD0855-WD0854)F1, (WD0855-WD0854)R1, (WD0855-WD0854)F2, and (WD0855-WD0854)R2. All ORFs within the putative virB8wspB operon were shown to be coexpressed and are thus considered to be a genuine operon. All products were amplified only from RT-positive reactions (see Figure 6). Primer sequences are given in Table S9.

Supporting Information

Figure S1. Phage Trees

Phylogenetic tree showing the relationship between WO-A and WO-B phage from wMel with reported phage from wKue and wTai. The tree was generated from a CLUSTALW multiple sequence alignment (Thompson et al. 1994) using the PROTDIST and NEIGHBOR programs of PHYLIP (Felsenstein 1989).
(60 KB PDF).

Figure S2. Plot of the Effective Number of Codons against GC Content at the Third Codon Position (GC3)

Proteins with fewer than 100 residues are excluded from this analysis because their effective number of codon (ENc) values are unreliable. The curve shows the expected ENc values if codon usage bias is caused by GC variation alone. Colors: yellow, hypothetical; purple, mobile element; blue, others. Most of the variation in codon bias can be traced to variation in GC, indicating that the mutation forces dominate the wMel codon usage. Multivariate analysis of codon usage was performed using the CODONW package (available from http://www.molbiol.ox.ac.uk/cu/codonW.html).
(289 KB PDF).

Figure S3. Predicted Metabolism and Transport in wMel

Overview of the predicted metabolism (energy production and organic compounds) and transport in wMel. Transporters are grouped by predicted substrate specificity: inorganic cations (green), inorganic anions (pink), carbohydrates (yellow), and amino acids/peptides/amines/purines and pyrimidines (red). Transporters in the drug-efflux family (labeled as “drugs”) and those of unknown specificity are colored black. Arrows indicate the direction of transport. Energy-coupling mechanisms are also shown: solutes transported by channel proteins (double-headed arrow); secondary transporters (two-arrowed lines, indicating both the solute and the coupling ion); ATP-driven transporters (ATP hydrolysis reaction); unknown energy-coupling mechanism (single arrow). Transporter predictions are based upon a phylogenetic classification of transporter proteins (Paulsen et al. 1998).
(167 KB PDF).

Table S1. Repeats of Greater Than 50 bp in the wMel Genome (with Coordinates)

(649 KB DOC).

Table S2. Inactivated Genes in the wMel Genome

(147 KB DOC).

Table S3. Ortholog Comparison with Rickettsia spp.

(718 KB XLS).

Table S4. Putative Lineage-Specific Gene Duplications in wMel

(116 KB DOC).

Table S5. Genetic Distances as Calculated for Alignments of wsp and wspB Gene Sequences from the Same Wolbachia Strains

(24 KB DOC).

Table S6. Putative DNA Repair and Recombination Genes in the wMel Genome

(26 KB DOC).

Table S7. Phylogenetic Results for Concatenated Data of 32 Mitochondrial Proteins

(34 KB DOC).

Table S8. Individual Phylogenetic Results for Reclinomonas Mitochondrial DNA-Encoded Proteins

(117 KB DOC).

Table S9. PCR Primers

(47 KB DOC).

Accession Numbers

The complete sequence for wMel has been given GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) accession ID number AE017196 and is available through the TIGR Comprehensive Microbial Resourceat http://www.tigr.org/tigr-scripts/CMR2/GenomePage3.spl?database=dmg
The GenBank accession numbers for other sequences discussed in this paper are AF020059 (Wolbachia sp. wAlbB outer surface protein precursor wsp gene), AF020070 (Wolbachia sp. wRi outer surface protein precursor wsp gene), AJ252062 (Wolbachia endosymbiont of D. immitis sp. gene for surface protein), AJ580921 (Wolbachia endosymbiont of D. immitis partial wspB gene for Wolbachia surface protein B), AJ580922 (Wolbachia endosymbiont of A. albopictus partial wspB gene for Wolbachia surface protein B), and AJ580923 (Wolbachia endosymbiont of D. simulans partial wspB gene for Wolbachia surface protein B).

Acknowledgments

We acknowledge Barton Slatko, Jeremy Foster, New England Biolabs, and Mark Blaxter for helping inspire this project; Rehka Seshadri for help in examining pathogenicity factors and reading the manuscript; Derek Fouts for examination of group II introns; Susan Lo, Michael Heaney, Vadim Sapiro, and Billy Lee for IT support; Maria-Ines Benito, Naomi Ward, Michael Eisen, Howard Ochman, and Vincent Daubin for helpful discussions; Steven Salzberg and Mihai Pop for help in comparing wMel with the D. melanogaster genome; Elodie Ghedin for access to the B. malayi Wolbachia sequence data; Maria Ermolaeva for assistance with analysis of operons; Dan Haft for designing protein family hidden Markov models for annotation; Owen White for general bioinformatics support; four anonymous reviewers for very helpful comments and suggestions; and Claire M. Fraser for continuing support of TIGR’s scientific research. This project was supported by grant UO1-AI47409–01 to Scott O’Neill and Jonathan A. Eisen from the National Institutes of Allergy and Infectious Diseases.
Conflicts of interest. The authors have declared that no conflicts of interest exist.
Author contributions. M. Wu contributed ideas and analysis in all aspects of the work. L. Sun performed purification of wMel DNA for initial libraries and closure. J. Vamathevan was the closure team leader, performed sequence assembly and analysis, and screened contigs against the Drosophila genome. M. Riegler performed validation of assembly against the physical map and confirmation of rearrangements by long PCR and analysis of repeat regions. R. Deboy was the annotation leader and managed the annotation, ORF management, and frameshifts. J. C. Brownlie performed analysis of Type IV secretion systems. E. A. McGraw performed validation of assembly against physical map and confirmation of rearrangements by long PCR and analysis of wsp paralogs. W. Martin, C. Esser, N. Ahmadinejad, and C. Wiegand performed the mitochondrial evolution analysis. R. Madupu, M. J. Beanan, L. M. Brinkac, S. C. Daugherty, A. S. Durkin, J. F. Kolonay, and W. C. Nelson performed genome annotation. Y. Mohamoud, P. Lee, and K. Berry performed the closure experiments (closed sequencing gaps, multiplex PCR, resolution of small repeats, coverage reactions, contig editing, resolution of large repeats by transposon and primer walking). M. B. Young was the shotgun sequencing leader. T. Utterback and J. Weidman performed shotgun sequencing and frameshift checking; Utterback also worked on the assembly. W. C. Nierman handled the library construction. I. T. Paulsen performed transporter analysis. K. E. Nelson performed metabolism analysis. H. Tettelin analyzed genome properties, repeats, and membrane proteins. S. L. O’Neill and J. A. Eisen supplied ideas, coordination, and analysis; Eisen is the corresponding author.

  1. Adachi J, Hasegawa M (1996) Model of amino acid substitution in proteins encoded by mitochondrial DNA. J Mol Evol 42:459–468. Find this article online
  2. Akman L, Yamashita A, Watanabe H, Oshima K, Shiba T, et al. (2002) Genome sequence of the endocellular obligate symbiont of tsetse flies, Wigglesworthia glossinidia. Nat Genet 32:402–407. Find this article online
  3. Andersson SG, Zomorodipour A, Andersson JO, Sicheritz-Ponten T, Alsmark UC, et al. (1998) The genome sequence of Rickettsia prowazekii and the origin of mitochondria. Nature 396:133–140. Find this article online
  4. Andersson SG, Karlberg O, Canback B, Kurland CG (2003) On the origin of mitochondria: A genomics perspective. Philos Trans R Soc Lond B Biol Sci 358:165–167. Find this article online
  5. Bazinet C, Rollins JE (2003) Rickettsia-like mitochondrial motility in Drosophila spermiogenesis. Evol Dev 5:379–385. Find this article online
  6. Bjorkholm B, Sjolund M, Falk PG, Berg OG, Engstrand L, et al. (2001) Mutation frequency and biological cost of antibiotic resistance in Helicobacter pylori. Proc Natl Acad Sci U S A 98:14607–14612. Find this article online
  7. Boyle L, O’Neill SL, Robertson HM, Karr TL (1993) Interspecific and intraspecific horizontal transfer of Wolbachia in Drosophila. Science 260:1796–1799. Find this article online
  8. Braig HR, Zhou W, Dobson SL, O’Neill SL (1998) Cloning and characterization of a gene encoding the major surface protein of the bacterial endosymbiont Wolbachia pipientis. J Bacteriol 180:2373–2378. Find this article online
  9. Bryant D, Moulton V (2003) Neighbor-Net: An agglomerative method for the construction of phylogenetic networks. Mol Biol Evol 20 Dec 5 [Epub ahead of print].
  10. Caturegli P, Asanovich KM, Walls JJ, Bakken JS, Madigan JE, et al. (2000) ankA: An Ehrlichia phagocytophila group gene encoding a cytoplasmic protein antigen with ankyrin repeats. Infect Immun 68:5277–5283. Find this article online
  11. Christie PJ (2001) Type IV secretion: Intercellular transfer of macromolecules by systems ancestrally related to conjugation machines. Mol Microbiol 40:294–305. Find this article online
  12. Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, et al. (1999) Alignment of whole genomes. Nucleic Acids Res 27:2369–2376. Find this article online
  13. Dumler SJ, Barbet AF, Bekker CPJ, Dasch GA, Palmer GH, et al. (2001) Reorganization of genera in the families Rickettsiaceae and Anaplasmataceae in the order Rickettsiales: Unification of some species of Ehrlichia with Anaplasma, Cowdria with Ehrlichia and Ehrlichia with Neorickettsia—Descriptions of six new species combinations and designation of Ehrlichiaqui and “HGE agent” as subjective synonyms of Ehrlichia phagocytophila. Intl J System Evol Microbiol 51:2145–2165. Find this article online
  14. Eiglmeier K, Parkhill J, Honore N, Garnier T, Tekaia F, et al. (2001) The decaying genome of Mycobacterium leprae. Lepr Rev 72:387–398. Find this article online
  15. Eisen JA (1997) Gastrogenomic delights: A movable feast. Nat Med 3:1076–1078. Find this article online
  16. Eisen JA (1998a) A phylogenomic study of the MutS family of proteins. Nucleic Acids Res 26:4291–4300. Find this article online
  17. Eisen JA (1998b) Phylogenomics: Improving functional predictions for uncharacterized genes by evolutionary analysis. Genome Res 8:163–167. Find this article online
  18. Eisen JA, Fraser CM (2003) Phylogenomics: Intersection of evolution and genomics. Science 300:1706–1707. Find this article online
  19. Eisen JA, Hanawalt PC (1999) A phylogenomic study of DNA repair genes, proteins, and processes. Mutat Res 435:171–213. Find this article online
  20. Eisen JA, Wu M (2002) Phylogenetic analysis and gene functional predictions: Phylogenomics in action. Theor Popul Biol 61:481–487. Find this article online
  21. Eisen JA, Heidelberg JF, White O, Salzberg SL (2000) Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol 1:1–9 RESEARCH0011. Find this article online
  22. Eisen JA, Nelson KE, Paulsen IT, Heidelberg JF, Wu M, et al. (2002) The complete genome sequence of Chlorobium tepidum TLS, a photosynthetic, anaerobic, green-sulfur bacterium. Proc Natl Acad Sci U S A 99:9509–9514. Find this article online
  23. Elfring LK, Axton JM, Fenger DD, Page AW, Carminati JL, et al. (1997) Drosophila PLUTONIUM protein is a specialized cell cycle regulator required at the onset of embryogenesis. Mol Biol Cell 8:583–593. Find this article online
  24. Emelyanov VV (2001a) Evolutionary relationship of Rickettsiae and mitochondria. FEBS Lett 501:11–18. Find this article online
  25. Emelyanov VV (2001b) Rickettsiaceae, Rickettsia-like endosymbionts, and the origin of mitochondria. Biosci Rep 21:1–17. Find this article online
  26. Emelyanov VV (2003a) Mitochondrial connection to the origin of the eukaryotic cell. Eur J Biochem 270:1599–1618. Find this article online
  27. Emelyanov VV (2003b) Phylogenetic affinity of a Giardia lamblia cysteine desulfurase conforms to canonical pattern of mitochondrial ancestry. FEMS Microbiol Lett 226:257–266. Find this article online
  28. Felsenstein J (1989) PHYLIP—Phylogeny inference package (version 3.2). Cladistics 5:164–166. Find this article online
  29. Frank AC, Amiri H, Andersson SG (2002) Genome deterioration: Loss of repeated sequences and accumulation of junk DNA. Genetica 115:1–12. Find this article online
  30. Gray MW, Burger G, Lang BF (2001) The origin and early evolution of mitochondria. Genome Biol 2:REVIEWS1018.
  31. Gupta RS (1995) Evolution of the chaperonin families (Hsp60, Hsp10 and Tcp-1) of proteins and the origin of eukaryotic cells. Mol Microbiol 15:1–11. Find this article online
  32. Hansmann S, Martin W (2000) Phylogeny of 33 ribosomal and six other proteins encoded in an ancient gene cluster that is conserved across prokaryotic genomes: Influence of excluding poorly alignable sites from analysis. Int J Syst Evol Microbiol 50:1655–1663. Find this article online
  33. Higgins D, Bleasby A, Fuchs R (1992) ClustalV: Improved software for multiple sequence alignment. Comput Appl Biosci 8:189–191. Find this article online
  34. Holmes DS, Bonner J (1973) Preparation, molecular weight, base composition, and secondary structure of giant nuclear ribonucleic acid. Biochemistry 12:2330–2338. Find this article online
  35. Hryniewicz-Jankowska A, Czogalla A, Bok E, Sikorsk AF (2002) Ankyrins, multifunctional proteins involved in many cellular pathways. Folia Histochem Cytobiol 40:239–249. Find this article online
  36. Itoh T, Martin W, Nei M (2002) Acceleration of genomic evolution caused by enhanced mutation rate in endocellular symbionts. Proc Natl Acad Sci U S A 99:12944–12948. Find this article online
  37. Jamnongluk W, Kittayapong P, Baimai V, O’Neill SL (2002) Wolbachia infections of tephritid fruit flies: Molecular evidence for five distinct strains in a single host species. Curr Microbiol 45:255–260. Find this article online
  38. Jeyaprakash A, Hoy MA (2000) Long PCR improves Wolbachia DNA amplification: wsp sequences found in 76% of sixty-three arthropod species. Insect Mol Biol 9:393–405. Find this article online
  39. Karlin S, Brocchieri L (2000) Heat shock protein 60 sequence comparisons: Duplications, lateral transfer, and mitochondrial evolution. Proc Natl Acad Sci U S A 97:11348–11353. Find this article online
  40. Kondo N, Nikoh N, Ijichi N, Shimada M, Fukatsu T (2002) Genome fragment of Wolbachia endosymbiont transferred to X chromosome of host insect. Proc Natl Acad Sci U S A 99:14280–14285. Find this article online
  41. Kurtz S, Schleiermacher C (1999) REPuter: Fast computation of maximal repeats in complete genomes. Bioinformatics 15:426–427. Find this article online
  42. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921. Find this article online
  43. Lang BF, Burger G, O’Kelly CJ, Cedergren R, Golding GB, et al. (1997) An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. Nature 387:493–497. Find this article online
  44. Lang BF, Seif E, Gray MW, O’Kelly CJ, Burger G (1999) A comparative genomics approach to the evolution of eukaryotes and their mitochondria. J Eukaryot Microbiol 46:320–326. Find this article online
  45. Lawrence JG (2001) Catalyzing bacterial speciation: Correlating lateral transfer with genetic headroom. Syst Biol 50:479–496. Find this article online
  46. Lawrence JG, Ochman H (1997) Amelioration of bacterial genomes: Rates of change and exchange. J Mol Evol 44:383–397. Find this article online
  47. Lawrence JG, Ochman H (1998) Molecular archaeology of the Escherichia coli genome. Proc Natl Acad Sci U S A 95:9413–9417. Find this article online
  48. Lin M, Rikihisha Y (2003) Ehrlichia chaffeensis and Anaplasma phagocytophilum lack genes for lipid A biosynthesis and incorporate cholesterol for their survival. Infect Immun 71:5324–5331. Find this article online
  49. Lo N, Casiraghi M, Salati E, Bazzocchi C, Bandi C (2002) How many Wolbachia supergroups exist? Mol Biol Evol 19:341–346. Find this article online
  50. Lockhart PJ, Steel MA, Hendy MD, Penny D (1994) Recovering evolutionary trees under a more realistic evolutionary model. Mol Biol Evol 11:605–612. Find this article online
  51. Martin W (1999) Mosaic bacterial chromosomes: A challenge en route to a tree of genomes. Bioessays 21:99–104. Find this article online
  52. Masui S, Sasaki T, Ishikawa H (2000) Genes for the type IV secretion system in an intracellular symbiont, Wolbachia, a causative agent of various sexual alterations in arthropods. J Bacteriol 182(22):6529–6531. Find this article online
  53. Masui S, Kuroiwa H, Sasaki T, Inui M, Kuroiwa T, et al. (2001) Bacteriophage WO and virus-like particles in Wolbachia, an endosymbiont of arthropods. Biochem Biophys Res Commun 283:1099–1104. Find this article online
  54. McGraw EA, Merritt DJ, Droller JN, O’Neill SL (2001) Wolbachia-mediated sperm modification is dependent on the host genotype in Drosophila. Proc R Soc Lond B Biol Sci 268:2565–2570. Find this article online
  55. Mira A, Ochman H, Moran NA (2001) Deletional bias and the evolution of bacterial genomes. Trends Genet 17:589–596. Find this article online
  56. Moran NA (1996) Accelerated evolution and Muller’s rachet in endosymbiotic bacteria. Proc Natl Acad Sci U S A 93:2873–2878. Find this article online
  57. Moran NA, Mira A (2001) The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol 2:RESEARCH0054.
  58. Muller M, Martin W (1999) The genome of Rickettsia prowazekii and some thoughts on the origin of mitochondria and hydrogenosomes. Bioessays 21:377–381. Find this article online
  59. O’Neill SL, Hoffmann AA, Werren JH, editors (1997a) Influential passengers: Inherited microorganisms and arthropod reproduction. Oxford: Oxford University Press. 228 p.
  60. O’Neill SL, Pettigrew MM, Sinkins SP, Braig HR, Andreadis TG, et al. (1997b) In vitro cultivation of Wolbachia pipientis in an Aedes albopictus cell line. Insect Mol Biol 6:33–39. Find this article online
  61. Ogata H, Audic S, Renesto-Audiffren P, Fournier PE, Barbe V, et al. (2001) Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science 293:2093–2098. Find this article online
  62. Parkhill J, Wren BW, Thomson NR, Titball RW, Holden MT, et al. (2001) Genome sequence of Yersinia pestis, the causative agent of plague. Nature 413:523–527. Find this article online
  63. Parkhill J, Sebaihia M, Preston A, Murphy LD, Thomson N, et al. (2003) Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat Genet 35:32–40. Find this article online
  64. Paulsen IT, Sliwinski MK, Saier MH Jr (1998) Microbial genome analyses: Global comparisons of transport capabilities based on phylogenies, bioenergetics and substrate specificities. J Mol Biol 277:573–592. Find this article online
  65. Paulsen IT, Nguyen L, Sliwinski MK, Rabus R, Saier MH Jr (2000) Microbial genome analyses: Comparative transport capabilities in eighteen prokaryotes. J Mol Biol 301:75–100. Find this article online
  66. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO (1999) Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proc Natl Acad Sci U S A 96:4285–4288. Find this article online
  67. Penny D, McComish BJ, Charleston MA, Hendy MD (2001) Mathematical elegance with biochemical realism: The covarion model of molecular evolution. J Mol Evol 53:711–723. Find this article online
  68. Read TD, Brunham RC, Shen C, Gill SR, Heidelberg JF, et al. (2000) Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39. Nucleic Acids Res 28:1397–1406. Find this article online
  69. Roelofs J, Van Haastert PJ (2001) Genes lost during evolution. Nature 411:1013–1014. Find this article online
  70. Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26:544–548. Find this article online
  71. Salzberg SL, White O, Peterson J, Eisen JA (2001) Microbial genes in the human genome: Lateral transfer or gene loss? Science 292:1903–1906. Find this article online
  72. Selby CP, Witkin EM, Sancar A (1991) Escherichia coli mfd mutant deficient in “mutation frequency decline” lacks strand-specific repair: In vitro complementation with purified coupling factor. Proc Natl Acad Sci U S A 88:11574–11578. Find this article online
  73. Seshadri R, Paulsen IT, Eisen JA, Read TD, Nelson KE, et al. (2003) Complete genome sequence of the Q-fever pathogen Coxiella burnetii. Proc Natl Acad Sci U S A 100:5455–5460. Find this article online
  74. Sexton JA, Vogel JP (2002) Type IVB secretion by intracellular pathogens. Traffic 3:178–185. Find this article online
  75. Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H (2000) Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407:81–86. Find this article online
  76. Sicheritz-Ponten T, Kurland CG, Andersson SG (1998) A phylogenetic analysis of the cytochrome b and cytochrome c oxidase I genes supports an origin of mitochondria from within the Rickettsiaceae. Biochim Biophys Acta 1365:545–551. Find this article online
  77. Sinkins SP, O’Neill SL (2000) Wolbachia as a vehicle to modify insect populations. In: James AA, editor. Insect transgenesis: Methods and applications. Boca Raton, Florida: CRC Press. 271–288.
  78. Stanhope MJ, Lupas A, Italia MJ, Koretke KK, Volker C, et al. (2001) Phylogenetic analyses do not support horizontal gene transfers from bacteria to vertebrates. Nature 411:940–944. Find this article online
  79. Strimmer K, von Haeseler A (1996) Quartet puzzling: A quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol 13:964–969. Find this article online
  80. Sun LV, Foster JM, Tzertzinis G, Ono M, Bandi C, et al. (2001) Determination of Wolbachia genome size by pulsed-field gel electrophoresis. J Bacteriol 183:2219–2225. Find this article online
  81. Sun LV, Riegler M, O’Neill SL (2003) Development of a physical and genetic map of the virulent Wolbachia strain wMelPop. J Bacteriol 185:7077–7084. Find this article online
  82. Sutton G, White O, Adams M, Kerlavage A (1995) TIGR assembler: A new tool for assembling large shotgun sequencing projects. Genome Sci Tech 1:9–19. Find this article online
  83. Tamas I, Klasson L, Canback B, Naslund AK, Eriksson AS, et al. (2002) 50 million years of genomic stasis in endosymbiotic bacteria. Science 296:2376–2379. Find this article online
  84. Taylor MJ (2002) A new insight into the pathogenesis of filarial disease. Curr Mol Med 2:299–302. Find this article online
  85. Taylor MJ, Hoerauf A (2001) A new approach to the treatment of filariasis. Curr Opin Infect Dis 14:727–731. Find this article online
  86. Taylor MJ, Bandi C, Hoerauf AM, Lazdins J (2000) Wolbachia bacteria of filarial nematodes: A target for control? Parasitol Today 16:179–180. Find this article online
  87. Tettelin H, Radune D, Kasif S, Khouri H, Salzberg SL (1999) Optimized multiplex PCR: Efficiently closing a whole-genome shotgun sequencing project. Genomics 62:500–507. Find this article online
  88. Tettelin H, Nelson KE, Paulsen IT, Eisen JA, Read TD, et al. (2001) Complete genome sequence of a virulent isolate of Streptococcus pneumoniae. Science 293:498–506. Find this article online
  89. Thompson JD, Higgins DG, Gibson TJ (1994) ClustalW: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22:4673–4680. Find this article online
  90. Tram U, Sullivan W (2002) Role of delayed nuclear envelope breakdown and mitosis in Wolbachia-induced cytoplasmic incompatibility. Science 296:1124–1126. Find this article online
  91. van Ham RC, Kamerbeek J, Palacios C, Rausell C, Abascal F, et al. (2003) Reductive genome evolution in Buchnera aphidicola. Proc Natl Acad Sci U S A 100:581–586. Find this article online
  92. Venter JC, Smith HO, Hood L (1996) A new strategy for genome sequencing. Nature 381:364–366. Find this article online
  93. Viale AM, Arakaki AK (1994) The chaperone connection to the origins of the eukaryotic organelles. FEBS Lett 341:146–151. Find this article online
  94. Volfovsky N, Haas BJ, Salzberg SL (2001) A clustering method for repeat analysis in DNA sequences. Genome Biol 2:RESEARCH0027.
  95. Ware J, Moran L, Foster J, Posfai J, Vincze T, et al. (2002) Sequencing and analysis of a 63 kb bacterial artificial chromosome insert from the Wolbachia endosymbiont of the human filarial parasite Brugia malayi. Int J Parasitol 32:159–166. Find this article online
  96. Wernegreen J, Moran NA (1999) Evidence for genetic drift in endosymbionts (Buchnera): Analyses of protein-coding genes. Mol. Biol. Evol 16:83–97. Find this article online
  97. Werren JH (1998) Wolbachia and speciation. In: Berlocher SH, editor. Endless forms: Species and speciation. New York: Oxford University Press. 245–260.
  98. Werren JH, O’Neill SL (1997) The evolution of heritable symbionts. In: O’Neill SL, Hoffmann AA, Werren JH, editors. Influential passengers: Inherited microorganisms and arthropod reproduction. Oxford: Oxford University Press. 1–41.
  99. Werren JH, Windsor DM (2000) Wolbachia infection frequencies in insects: Evidence of a global equilibrium? Proc R Soc Lond B Biol Sci 267:1277–1285. Find this article online
  100. Witkin EM (1994) Mutation frequency decline revisited. Bioessays 16:437–444. Find this article online
  101. Zhou W, Rousset F, O’Neill SL (1998) Phylogeny and PCR-based classification of Wolbachia strains using wsp gene sequences. Proc R Soc Lond B Biol Sci 265:509–515. Find this article online

Why I am ashamed to have a paper in Science

So I just had a paper published in Science last week. In many ways, it has all the makings of one of those papers I should be really proud of. First, it represents a collaboration with my undergraduate advisor, Colleen Cavanaugh, the person who inspired me to go to graduate school and who got me interested in microorganisms, which I have worked on ever since (I published my first scientific paper on work I did in her lab). The paper is on one of the coolest biological systems on the planet – bacterial symbionts of deep sea animals that allow these animals to function much like plants (they use chemosynthesis in much the same way plants use photosynthesis). Studies of the deep sea and of chemosynthesis are important for understanding the origin and evolution of life, for understanding global carbon cycles, for understanding the rules by which symbioses evolve and much more. And on top of all of this, the paper reports the sequencing and analysis of the complete genome of one of these symbionts (that from the clam Calyptogena magnifica) – and one of my main areas of research is on the evolution of the genomes of symbionts. And, the genome was sequenced at the Joint Genome Institute, where I now have an Adjunct Position and am working with extensively. All sounds good right? And, I should be happy to get a paper in Science too, right?

Actually, in reality, I am not pleased with how this paper has turned out. This is really due to two things. First, my collaborators failed to keep me in the loop that the paper was accepted in Science. Thus I did not find out about the paper until I did a google search for some other reason and noticed this Deep-Sea News Blog which had a story, well, about the paper in Science. It would of course have been nice to know the paper was accepted and coming out. It would have been even better to have seen the page proofs, which might have given me the chance to catch some little and not so little mistakes (e.g., the paper claims that this species has the largest genome of any intracellular symbiont sequenced to date – which is unfortunately not true). Now, admittedly I was out sick for a while and maybe my collaborators just did not want to bother me with this information. More likely- people were just very busy – and this just slipped through the cracks.

But you know – it is a Science paper. I should be happy however it came into being right? Well, no. Completely and thoroughly wrong. You see, I do not support publishing things in Science. I object because Science is not an Open Access journal. I tried and tried to get Irene Newton the first author to submit this to another journal. But in the end, she did the brunt of the work, and thus she and her advisor, Colleen, got to pick the place. And in the time since Irene submitted the paper, I have become even more miltant against publishing in such non Open Access journals. Publishing in a non Open Access journal like Science make me feel icky in every way. In addition, by choosing to publish the paper there but not elsewhere, the field of deep sea symbionts may have been hurt rather than helped.

How could a Science paper hurt the field? Well, for one, Science with its page length obsession forced Irene to turn her enormous body of work on this genome into a single page paper with most of the detail cut out. I do not think a one page paper does justice to the interesting biology or to her work. A four page paper could have both educated people about the ecosystems in the deep sea, about intracellular symbionts in general, and about this symbiosis in particular. The deep sea is wildly interesting, and also at some risk from human activities. This paper could have been used to do more than just promote someone’s resume (which really is the only reason to publish a one page page in Science).

But of course, even more importantly, anyone without a subscription to Science, well, they can’t even read the paper. And AAAS gets to decide what happens to the text and figures in the future. So – count this as one of my papers I am not really proud of. I love that I helped my Undergrad. advisor and one of my favorite people in the world do this work. But by it not being in an Open Access journal, I have unfortunately contributed to a system that I think is bad for the world. And I just fell icky.

Some news stories and blogs are coming out on the paper:

Below I have embedded a video of a dissection of what I think was a deep sea Calyptogena, just for the fun of it.

This was taken during a deep sea cruise I managed to get on. For mroe detail on this cruise, see the NOAA Ocean Explorers site here.