Misc. – Page 84 – Jonathan Eisen's Lab

A blast from the past: Plasmodium, plastids, phylogeny, and reproducibility

A few days ago I got an email from a colleague who I had not seen in many years. It was from Malcolm Gardner who worked at TIGR when I was there and is now at Seattle Biomed.

His email was related to the 2002 publication of the complete genome sequence of Plasmodium falciparum – the causative agent of most human malaria cases – for which he was the lead author. Someone had emailed Malcolm asking if he could provide details about the settings used in the blast searches that were part of the evolutionary analyses of the paper. The paper is freely available at Nature – at least for now – every once in a while the Nature Publishing Group seems to put it behind a paywall despite their promises not to.

Malcolm was contacting me because I had run / coordinated much of the evolutionary analysis reported in that paper. I note – as one of the only evolution focused people at TIGR it was pretty common for people to come to me and ask if I could help them with their genome. I pretty much always said yes since, well, I loved doing that kind of thing and it was really exciting in the early days of genome sequencing to be the first person to ask some evolution related question about the data.

Malcolm included the email he had received (which did not have a lot of detail) and he and I wrote back and forth trying to figure out exactly what this person wanted. And then I said, well, maybe the person should get in touch with me directly so I can figure out what they really want/need. It seemed unusual that someone was asking about something like that from a 10 year old paper, but, whatever.

As I was communicating with this person, I started digging through my files and my brain trying to remember exactly what had been done for this paper more than 10 years ago. I remember Malcolm and others from the Plasmodium community organizing some “jamborees” looking at the annotation of the genome. At one of those jamborees I met with some of the folks from the Sanger Center (which was one of the big players in the P. falciparum genome sequencing) with Malcolm and – after some discussion I ended up doing three main things relating to the paper, which I describe below.

Thing 1: Conserved eukaryote genes

One of my analyses was to use the genome to look for genes conserved in eukaryotes but not present in bacteria or archaea. I did this to try and find genes that could be considered likely to have been invented on the evolutionary branch leading up to the common ancestor of eukaryotes.

As an aside, at about the same time I was asked to write a News and Views for Nature about the publication of the Schizosaccharomyces pombe genome. In the N&V I had written “Genome sequencing: Brouhaha over the other yeast” I noted how the authors had used the genome to do some interesting analysis of conserved eukaryotic genes. With the help of the Nature staff I had also made a figure which demonstrated (sort of) what they were trying to do in their analysis – which was to find genes that originated on the branch leading up to the common ancestor of the eukaryotes for which genomes were available at the time. As another aside – the S. pombe genome paper and my News and Views article are freely available …

Figure 1: The tree of life, with the branches labelled according to Wood et al.’s analysis of genes that might be specific to eukaryotes versus prokaryotes, and to multicellular versus single-celled organisms. Bacteria and archaea are prokaryotes (they do not have nuclei). From Nature 415, 845-848 (21 February 2002) | doi:10.1038/nature725. The eukaryotic portion of the tree is based on Baldauf et al. 2000.

Anyway, I did a similar analysis to what was in the S. pombe genome paper and I found a reasonable number and helped write a section for the paper on this.

Comparative genome analysis with other eukaryotes for which the complete genome is available (excluding the parasite E. cuniculi) revealed that, in terms of overall genome content, P. falciparum is slightly more similar to Arabidopsis thaliana than to other taxa. Although this is consistent with phylogenetic studies (64), it could also be due to the presence in the P. falciparum nuclear genome of genes derived from plastids or from the nuclear genome of the secondary endosymbiont. Thus the apparent affinity of Plasmodium and Arabidopsis might not reflect the true phylogenetic history of the P. falciparum lineage. Comparative genomic analysis was also used to identify genes apparently duplicated in the P. falciparum lineage since it split from the lineages represented by the other completed genomes (Supplementary Table B).

There are 237 P. falciparum proteins with strong matches to proteins in all completed eukaryotic genomes but no matches to proteins, even at low stringency, in any complete prokaryotic proteome (Supplementary Table C). These proteins help to define the differences between eukaryotes and prokaryotes. Proteins in this list include those with roles in cytoskeleton construction and maintenance, chromatin packaging and modification, cell cycle regulation, intracellular signalling, transcription, translation, replication, and many proteins of unknown function. This list overlaps with, but is somewhat larger than, the list generated by an analysis of the S. pombe genome (65). The differences are probably due in part to the different stringencies used to identify the presence or absence of homologues in the two studies.

The list of genes is available as supplemental material on the Nature web site. Alas it is in MS Word format which is not the most useful thing. But more on that issue at the end of this post.

Thing 2. Searching for lineage specific duplications

Another aspect of comparative genomic analysis that I used to do for most genomes at TIGR was to look for lineage specific duplications (i.e., genes that have undergone duplications in the lineage of the species being studied to the exclusion of the lineages for which other genomes are available). The quick and dirty way we used to do this was to simply look for genes that had a better blast match to another gene from their own genome than to genes in any other genome. The list of genes we identified this way is also provided as a Word document in Supplemental materials.

Thing 3: Searching for organelle derived genes in the nuclear genome of P. falciparum

The third thing I did for the paper was to search for organelle derived genes in the nuclear genome of Plasmodium. Specifically I was looking for genes derived from the mitochondrial genome and plastid genome. For those who do not know, Plasmodium is a member of the Apicomplexa – all organisms in this group have an unusual organelle called the Apicoplast. Though the exact nature of this organelle had been debated, it’s evolutionary origins were determined by none other than Malcolm Gardner many years earlier (Gardner et al. 1994). They had shown that this organelle was in fact derived from chloroplasts (which themselves are derived from cyanoabcteria). I am shamed to say that before hanging out with Malcolm and talking about Plasmodium I did not know this. This finding of a chloroplast in an evolutionary group of eukaryotes that are not particularly closely related to plants is one of the key pieces of evidence in the “secondary endosymbiosis” hypothesis which proposes that some eukaryotes have brought into themselves as an endosymbiont a single-celled photosynthetic algae which had a chloroplast.

Anyway – here we were – with the first full genome of a member of the Apicomplexans group. And we could use it to discover some new details on plastid evolution and secondary endosymbioses. So I adapted some methods I had used in analyzing the Arabidopsis genome (see Lin et al. 1999 and AGI 2000), and searched for plastid derived genes in the nuclear genome of Plasmodium. Why look in the nuclear genome for plastid genes? Or mitochondrial genes for that matter. Well, it turns out that genes that were once in the organelle genomes frequently move to the nuclear genome of their “host”. In fact, a lot of genes move. So – if you want to study the evolution of an organism’s organelles, it is sometimes more fruitful to look in the nuclear genome than in the actual organelle’s genome. OK – now back to the Plasmodium genome. What I was doing was trying to find genes in the nuclear that had once been in the plastid genome. How would you look for these?

To find mitochondrial-derived genes I did blast searches against the same database of genomes used to study the evolution of eukaryotes but for this I looked for genes in Plasmodium that has decent matches to genes in alpha proteobacteria. And for those I then build phylogenetic trees of each gene and its homologs, then screened through all the trees to look for any in which the gene from Plasmodium grouped in a tree inside a clade with sequences from alpha proteobacteria (and allowed for mitochondrial genes from other eukaryotes to be in this clade).

To find plastid derived genes I did a similar screen except instead searched for genes that grouped in evolutionary trees with genes from cyanobacteria (or eukaryotic genes that were from plastids). The section of the paper that I helped write is below:

A large number of nuclear-encoded genes in most eukaryotic species trace their evolutionary origins to genes from organelles that have been transferred to the nucleus during the course of eukaryotic evolution. Similarity searches against other complete genomes were used to identify P. falciparum nuclear-encoded genes that may be derived from organellar genomes. Because similarity searches are not an ideal method for inferring evolutionary relatedness (66), phylogenetic analysis was used to gain a more accurate picture of the evolutionary history of these genes. Out of 200 candidates examined, 60 genes were identified as being of probable mitochondrial origin. The proteins encoded by these genes include many with known or expected mitochondrial functions (for example, the tricarboxylic acid (TCA) cycle, protein translation, oxidative damage protection, the synthesis of haem, ubiquinone and pyrimidines), as well as proteins of unknown function. Out of 300 candidates examined, 30 were identified as being of probable plastid origin, including genes with predicted roles in transcription and translation, protein cleavage and degradation, the synthesis of isoprenoids and fatty acids, and those encoding four subunits of the pyruvate dehydrogenase complex. The origin of many candidate organelle-derived genes could not be conclusively determined, in part due to the problems inherent in analysing genes of very high (A + T) content. Nevertheless, it appears likely that the total number of plastid-derived genes in P. falciparum will be significantly lower than that in the plant A. thaliana (estimated to be over 1,000). Phylogenetic analysis reveals that, as with the A. thaliana plastid, many of the genes predicted to be targeted to the apicoplast are apparently not of plastid origin. Of 333 putative apicoplast-targeted genes for which trees were constructed, only 26 could be assigned a probable plastid origin. In contrast, 35 were assigned a probable mitochondrial origin and another 85 might be of mitochondrial origin but are probably not of plastid origin (they group with eukaryotes that have not had plastids in their history, such as humans and fungi, but the relationship to mitochondrial ancestors is not clear). The apparent non-plastid origin of these genes could either be due to inaccuracies in the targeting predictions or to the co-option of genes derived from the mitochondria or the nucleus to function in the plastid, as has been shown to occur in some plant species (67).

Thing 4: Analysis of DNA repair genes

Arnab Pain from the Sanger Center and I analyzed genes predicted to be involved in DNA repair and recombination processes and wrote a section for the paper:

DNA repair processes are involved in maintenance of genomic integrity in response to DNA damaging agents such as irradiation, chemicals and oxygen radicals, as well as errors in DNA metabolism such as misincorporation during DNA replication. The P. falciparum genome encodes at least some components of the major DNA repair processes that have been found in other eukaryotes (111, 112). The core of eukaryotic nucleotide excision repair is present (XPB/Rad25, XPG/Rad2, XPF/Rad1, XPD/Rad3, ERCC1) although some highly conserved proteins with more accessory roles could not be found (for example, XPA/Rad4, XPC). The same is true for homologous recombinational repair with core proteins such as MRE11, DMC1, Rad50 and Rad51 present but accessory proteins such as NBS1 and XRS2 not yet found. These accessory proteins tend to be poorly conserved and have not been found outside of animals or yeast, respectively, and thus may be either absent or difficult to identify in P. falciparum. However, it is interesting that Archaea possess many of the core proteins but not the accessory proteins for these repair processes, suggesting that many of the accessory eukaryotic repair proteins evolved after P. falciparum diverged from other eukaryotes.

The presence of MutL and MutS homologues including possible orthologues of MSH2, MSH6, MLH1 and PMS1 suggests that P. falciparum can perform post-replication mismatch repair. Orthologues of MSH4 and MSH5, which are involved in meiotic crossing over in other eukaryotes, are apparently absent in P. falciparum. The repair of at least some damaged bases may be performed by the combined action of the four base excision repair glycosylase homologues and one of the apurinic/apyrimidinic (AP) endonucleases (homologues of Xth and Nfo are present). Experimental evidence suggests that this is done by the long-patch pathway (113).

The presence of a class II photolyase homologue is intriguing, because it is not clear whether P. falciparum is exposed to significant amounts of ultraviolet irradiation during its life cycle. It is possible that this protein functions as a blue-light receptor instead of a photolyase, as do members of this gene family in some organisms such as humans. Perhaps most interesting is the apparent absence of homologues of any of the genes encoding enzymes known to be involved in non-homologous end joining (NHEJ) in eukaryotes (for example, Ku70, Ku86, Ligase IV and XRCC1)(112). NHEJ is involved in the repair of double strand breaks induced by irradiation and chemicals in other eukaryotes (such as yeast and humans), and is also involved in a few cellular processes that create double strand breaks (for example, VDJ recombination in the immune system in humans). The role of NHEJ in repairing radiation-induced double strand breaks varies between species (114). For example, in humans, cells with defects in NHEJ are highly sensitive to -irradiation while yeast mutants are not. Double strand breaks in yeast are repaired primarily by homologous recombination. As NHEJ is involved in regulating telomere stability in other organisms, its apparent absence in P. falciparum may explain some of the unusual properties of the telomeres in this species (115).

Back to the story
Anyway … back to the story. I do not have current access to all of TIGR’s old computer systems which is where my searches for the genome paper reside. But I figured I might have some notes somewhere on my computer about what blast parameters I used for these searches. And amazingly I did. As I was getting ready to write back to Malcolm and to the person who has asked for the information I decided to double check to see what was in the paper. And amazingly, much of the detail was right there all along.

Plasmodium falciparum proteins were searched against a database of proteins from all complete genomes as well as from a set of organelle, plasmid and viral genomes. Putative recently duplicated genes were identified as those encoding proteins with better BLASTP matches (based on E value with a 10-15 cutoff) to other proteins in P. falciparum than to proteins in any other species. Proteins of possible organellar descent were identified as those for which one of the top six prokaryotic matches (based on E value) was to either a protein encoded by an organelle genome or by a species related to the organelle ancestors (members of the Rickettsia subgroup of the -Proteobacteria or cyanobacteria). Because BLAST matches are not an ideal method of inferring evolutionary history, phylogenetic analysis was conducted for all these proteins. For phylogenetic analysis, all homologues of each protein were identified by BLASTP searches of complete genomes and of a non-redundant protein database. Sequences were aligned using CLUSTALW, and phylogenetic trees were inferred using the neighbour-joining algorithms of CLUSTALW and PHYLIP. For comparative analysis of eukaryotes, the proteomes of all eukaryotes for which complete genomes are available (except the highly reduced E. cuniculi) were searched against each other. The proportion of proteins in each eukaryotic species that had a BLASTP match in each of the other eukaryotic species was determined, and used to infer a ‘whole-genome tree’ using the neighbour-joining algorithm. Possible eukaryotic conserved and specific proteins were identified as those with matches to all the complete eukaryotic genomes (10-30 E-value cutoff) but without matches to any complete prokaryotic genome (10-15 cutoff).

Alas, I cannot for the life of me find what other parameters I used for the blastp searches. I am 99.9999% sure I used default settings but alas, I don’t know what default settings for blast were in that era. And I am not even sure which version of blastp was installed on the TIGR computer systems then. I certainly need to do a better job of making sure everything I do is truly reproducible.

Reproducibility

This all brings me to the actual real part of this story. Reproducibility. It is a big deal. Anyone should be able to reproduce what was done in a study. And alas, it is difficult to do that when not all the methods are fully described. And one should also provide intermediate results so that people to do not have to redo everything you did in a study but can just reproduce part of it. It would be good to have, for example, released all the phylogenetic trees from the analysis of organellar genes in Plasmodium. Alas, I do not seem to have all of these files as they were stored in a directory at TIGR dedicated to this genome project and as I am no longer at TIGR I do not have ready access to that material. It is probably still lounging around somewhere on the JCVI computer systems (TIGR alas, no longer officially exists … it was swallowed by the J. Craig Venter Institute …). But I will keep digging and I will post them to some place like FigShare if/when I find them.

Perhaps more importantly, I will be working with my lab to make sure that in the future we store/record/make available EVERYTHING that would allow people to reproduce, re-analyze, re-jigger, re-whatever anything from our papers.

The key lesson – plan in advance for how you are going to share results, methods, data, etc …

Profile of Michael Turelli in the Sacramento Bee

Pretty good profile of Michael Turelli in the Sacramento Bee: UCD professor Michael Turelli finds biomathematics work ‘ridiculously satisfying’ – Living Here – The Sacramento Bee. It discusses his career from PhD work to early research to his new work on Wolbachia. Note of lack of objectivity on my part – Turelli was the first person to recruit me to UC Davis and, well, I love him. He simply is great …

Rosacea – What Causes It? News story overplays suggested connection to skin mites

Just got done reading this: Could Bacteria in Skin Mites Help Cause Rosacea? – US News and World Report. The article leads off with a bold statement that caught my eye

“Bacteria carried by tiny mites on the skin might be responsible for the common dermatological condition known as rosacea, researchers say.”

This caught my attention because I have been reading up on skin microbes recently and though many have suggested connections between microbes and rosacea as far as I know nobody has shown any causal relationship. And causation vs. correlation has been on my mind a lot recently.

So I read further and found some suggestive but inconclusive statements that were linked together

there are more of these mites on the skin of patients with rosacea than on those without
a bacterium (Bacillus oleronius) has been found in the mites and in people w/ rosacea
this bacterium can be killed with the same antibiotics that seem to have some success in treating rosacea
people with rosacea have an immune reaction to compounds from this bacterium
another bacterium Staphylococcus epidermis also appears in patients w/ rosacea but not patients free of rosacea

And that apparently was it … not very convincing. Sounds like just a lot of random correlations to me. So I decided to dig deeper. And I went to see fi I could find the paper which alas was not linked from the news story.

I googled the journal name “Journal of Medical Microbiology” and got to the web site. The news article had said the “review paper” had come out August 30th so I clicked on the Papers In Press link and got to the paper. I browsed the abstract, which seemed somewhat different from the gist of the news story

Rosacea is a common dermatological condition that predominantly affects the central regions of the face. Rosacea affects up to 3% of the world’s population and a number of subtypes are recognized. Rosacea can be treated with a variety of antibiotics (e.g. tetracycline or metronidazole) yet no role for bacteria or microbes in its aetiology has been conclusively established. The density of Demodex mites in the skin of rosacea patients is higher than in controls, suggesting a possible role for these mites in the induction of this condition. In addition, Bacillus oleronius, known to be sensitive to the antibiotics used to treat rosacea, has been isolated from a Demodex mite from a patient with papulopustular rosacea and a potential role for this bacterium in the induction of rosacea has been proposed. Staphylococcus epidermidis has been isolated predominantly from the pustules of rosacea patients but not from unaffected skin and may be transported around the face by Demodex mites. These findings raise the possibility that rosacea is fundamentally a bacterial disease resulting from the over proliferation of Demodex mites living in skin damaged as a result of adverse weathering, age or the production of sebum with an altered fatty acid content. This review surveys the literature relating to the role of Demodex mites and their associated bacteria in the induction and persistence of rosacea and highlights possible therapeutic options.

And then I did what usually causes me much anguish when I am at home – I clicked on the link for the full text, thinking that I would get a paywall. And low and behold, I got the preprint of the paper. The paper is quite interesting in many ways with lots of details about these mites I knew nothing about. It also has a lot of detail on these two bacterial species and why the authors think they are of interest in rosacea etiology. But no convincing evidence of any kind is presented that there is a causal connection to these bacteria or to these mites. I leave everyone with the last paragraph of the paper

The pathogenic role of Demodex mites, as well as B. oleronius and S. epidermidis, in the induction and persistence of rosacea remains an unresolved issue. The lack of an immunological response to Demodex mites in healthy skin raises the possibility of localized immunosuppression, facilitating the survival of the mite. Hopefully, the results of further research will bring us closer to understanding the role of microbes in the pathogenesis of rosacea and assist in the development of new and more effective therapies for the treatment of this disfiguring disease.

I agree. Unresolved.

Winner of the "genome conference speakers should be male" award …

Presenters at the World Genome Data Analysis Summit. Women highlighted in yellow.

Richard LeDuc, Manager, National Center for Genome Analysis Support, Indiana University
Gholson Lyon, Assistant Professor, Cold Spring Harbour Laboratory
Christopher Mason, Assistant Professor, Cornell University
Liz Worthey, Assistant Professor, Medical College of Wisconsin
Garry Nolan, Professor of Genetics, Stanford University
David Dooling, Assistant Director, Genome Institute, Washington University
Peter Robinson, Senior Technical Marketing Manager, DataDirect Networks
Thomas Keane, Senior Scientific Manager, Sequencing Informatics, Wellcome Trust Sanger Institute
Eric Fauman, Associate Research Fellow, Pfizer
Geetha Vasudevan, Assistant Director and Bioinformatics Scientist, Bristol-Myers Squibb
Shanrong Zhao, Senior Scientist, Johnson & Johnson
Bill Barnett, Director, National Center for Genome Analysis Support, Indiana University
Zemin Zhang, Senior Scientist, Bioinformatics, Computational Biology, Genentech
Christopher Mason, Assistant Professor, Cornell University
James Cai, Head, Disease & Translational Informatics, Roche
Eric Zheng, Fellow of Bioinformatics Science, Regeneron
Monica Wang, Associate Director, Knowledge Engineering, Millennium
Joachim Theilhaber, Lead Bioinformatics Research Investigator, Sanofi
Francisco De La Vega, Visiting Scholar, Stanford University
Don Jennings, Manager of Data Integration, Enterprise Information Management, Eli Lilly
Deepak Rajpal, Senior Scientific Investigator, Computational Biology, GSK
Mark Schreiber, Associate Director, Knowledge Engineering, Novartis

So that is a ratio of 19:3 for a whopping 13.6% women. Please – I beg of you – if you are organizing a conference give some thought to the diversity of speakers. In my experience the best conferences have always ended up being ones with highly diverse speakers. These conferences were good probably because the organizers put a lot of thought into who to invite to speak, rather than just inviting either big names or people that one knew in some way.

UPDATE: It has been pointed out that I listed one person (Chris Mason) twice — so it is only an 18:3 ratio. Phew. Much better.

For other posts on this topic see

Velasquez-Manoff opinion piece in the NY Times on autism, parasites & inflammation; nice ideas; not enough caveats

There is a very interesting “Opinion” piece in the New York Times today: Immune Disorders and Autism – NYTimes.com. By Moises Velasquez-Manoff is details some recent work that the author believes relates to autism and a variety of other human ailments with an autoimmune connection.

The general logic/key points seem to be as follows:

Some autism cases look like a form of inflammatory diseases with the immune system overactive (inflammation on high, anti-inflammation on low, or some combination thereof)
Infection of a mother during pregnancy increases the risk of having a child with autism.
In animal models, inducing inflammation in the mother (even without an infection) leads to an increased risk of behavioral “problems” in her offspring
Inflammatory and/or autoimmune diseases (e.g., asthma) have increased in incidence along with autism.
If a mother has automimmune or inflammatory diseases such as rheumatoid arthritis celiac disease she has a higher risk of having a child with autism. Similarly if a mother has allergies or asthma during the second trimester, there is a higher risk of having children with autism.
Many automimmune and inflammatory disorders and autism are all more prevalent is the developed world.
The developed world is generally cleaner that the developing world.
There are many fewer parasites in people in the developed world.
Parasites are known to suppress inflammation.
Therefore, we may be able to stop/limit autism, asthma, and other inflammatory diseases by purposefully infecting people with parasites from our evolutionary past.

Now, personally, I like the general hypothesis here. It makes complete sense. But alas, it is suffers from this issue that is spreading almost as fast as these diseases – a lack of a discussion of the distinction between correlation and causation. I have been obsessing about this a bit recently with studies of the microbiome. Overall, I do like this current article. It mixes human epidemiological studies with controlled animal studies with discussion of conceptual models. But alas there is really no discussion of the challenges if disentangling correlations vs. causation. And I think it is a bit dangerous in the latter parts with the jump to potentially curing these various ailments by purposeful infection with parasites. Again, I like the idea. But a few caveats would have been nice. I am glad it was marked as an opinion piece but even when one states an opinion about a medical issue, one can still say “there are reasons why this might not be true .. such as …”. Too bad that wasn’t done here.

UPDATE – Emily Willingham has written a VERY detailed critique of the article that I think everyone interested in anything related to this topic should read: Emily Willingham: Autism, immunity, inflammation, and the New York Timeswww.emilywillinghamphd.com.

Notes from some recent meetings about microbiology of the built environment

Quick post here. At the microBEnet site that I run we have posted some notes, slides, videos and other information from a few recent meetings on the topic of “microbiology of the built environment” that may be of interest so I am posting links here

"Genomics: the Power and the Promise" meeting – could be called "Men Studying Genomics" instead

Just got another email advertising this meeting: Genomics: the Power and the Promise. Organized by Genome Canada and the Gairdner Foundation. And, well, though I love some of the things Genome Canada has done, this conference really stick in my craw in the wrong way. Why? It has a serious male speaker overabundance. Here is the list of speakers:

Day 1

Pierre Muelien
John Dirks
Gary Goodyear
Eric Lander
Craig Venter
Philip Sharp
Svante Paabo
Tom Hudson
Peter Jones
Stephen Scherer
Michael Hayden
Bertha Maria Knoppers

Day 2

Stephen Mayfield
Elizabeth Edwards
Curtis Suttle
Peter Langridge
Michel Georges
William Davidson
Klaus Ammann

That is 17:2 male: female ratio. That is one female speaker per day. Not impressive.

On Day 2 there are two panels (which generally I do not count as “speakers” but at least there are a few more women on these):

Panel 1: Sally Aitken, Vincent Martin, Elizabeth Edwards, Curtis Suttle, Gerrit Voordouw, Steve Yearley
Panel 2: William Davidson, Martine Dubuc, Isobel Parkin, Graham Plastow, Curtis Pozniak, Peter Phillips

So if you count these that then comes to a ratio of presenters of 25: 6. Do I want quotes for meetings? No, but given that the ratio of men: women in biology is close to 1:1 this suggests to me some sort of bias. Where does this bias come from? I don’t know. Could be at the level of who gets invited. Could be at the level of who accepts. Could be some non obvious criterion for selecting speakers that leads to a bias towards men. I don’t know. But I personally think they could do better. And I note – they could probably do better in terms of other aspects of diversity of speakers, but I am focusing here just on the male vs. female ratio. Again, I am not suggesting one should have quotas for all meetings but at the same time, a 17:2 male to female speaker ratio suggests something could use some working on.

As a side story I decided to look at some past conferences sponsored by Genome Canada. I worked my way down the list … see below:

2008 Joint IUFRO-CTIA International conference. Speakers: 8:2 male: female
6th Canadian Plant Genomics Workshop Plenary Speakers 8:2
8th Annual International Conference of the Canadian Proteomics Initiative. See below. 32:2 male to female. I have no idea what the ratio is in the field of proteomics but this is a very big skew in the ratio. 94% male.

Leigh Anderson (Plasma Proteome Institute)
Ron Beavis (UBC)
John Bergeron (McGill)
Christoph Borchers (UVic)
Jens Coorssen (U Calgary)
Al Edwards (U Toronto)
Andrew Emili (U Toronto)
Leonard Foster (UBC)
Jack Greenblatt (U Toronto)
Juergen Kast (UBC)
Gilles Lajoie (U Western Ontario)
Liang Li (U Alberta)
John Marshall (Ryerson)
Susan Murch (UBC Okanagan)
Richard Oleschuk (Queens)
Dev Pinto (NRC)
Guy Poirier (Laval)
Don Riddle (UBC)
David Schreimer (University of Calgary)
Christoph Sensen (University of Calgary)
Michael Siu (York)
John Wilkins (University of Manitoba)
David Wishart (University of Alberta)
Rober McMaster (Universiyt of British Columbia)
Peter Liu (University of Toronto)
Christopher Overall (Universiyt of British Columbia)
John Kelly (NRC, Ottawa)
Joshua N. Adkins (Pacific Northwest National Laboratory, USA)
Dustin N.D. Lippert (University of British Columbia)
David Juncker (McGill University)
Jenya Petrotchenko (University of Victoria)
Detlev Suckau (Bruker Daltonik GmbH)
Peipei Ping (University of California)
Robert McMaster (University of British Columbia)

I couldn’t bear to go on any further.

Now – note – I am not accusing anyone of bias here. But I do think it might be a good idea for Genome Canada to put some more effort into figuring out why the conferences they sponsor have such skewed ratios. And perhaps they can try to do something about this. For more on this issue from my blog see

Referring to 16S surveys as "metagenomics" is misleading and annoying #badomics #OmicMimicry

Aargh. I am a big fan if of ribosomal RNA based surveys of microbial diversity. Been doing them for 20+ years and still continue to – even though I have moved on to more genomic/metagenomic based studies. But it drives me crazy to see rRNA surveys now being called “metagenomics”.

Here are some examples of cases where rRNA surveys are referred to as metagenomics:

Deep 16S rRNA metagenomics and quantitative PCR analyses of the premature infant fecal microbiota … – Wow — rRNA as metagenomics even made it into the title here
Paper: 16S rRNA metagenomics-based survey of oral biofilms in obese children… Poster Abstract
Gastroenterology & Endoscopy News – Studies Link Composition of … News Story
Metagenomic study of the oral microbiota by Illumina high-throughput sequencing – paper is only about rRNA sequencing, not metagenomics
Dr. Dag Harmsen publishes first 16S metagenomic study on the Ion PGM Sequencing … a Youtube video highlighting a PLoS One paper
Bacterial Community Shift in Treated Periodontitis Patients Revealed by Ion Torrent 16S rRNA Gene Amplicon Sequencing – PLOS One paper from the video above
EUREKA GENOMICS | 16S metagenomic analysis service – a company pushing their services
A Metagenomic Approach to Characterization of the Vaginal Microbiome Signature in Pregnancy … PLoS One paper

I found these examples in about five minutes of googling. I am sure there are many many more.

Why does this drive me crazy? Because rRNA surveys focus on a single gene. They are not gnomicy in any way. Thus it is misleading to refer to rRNA surveys as “metagenomics”. Why do people do this? I think it is pretty simple. Genomics and metagenomics are “hot” topics. To call what one is doing “metagenomics” makes it sound special. Well, just like adding an “omic” suffix does not make ones work genomics – falsely labeling work as some kind of “omics” also does not make it genomics.

Enough of this. If you are doing rRNA surveys of microbial communities – great – I love them. But do not refer to this work as metagenomics. If you do, you are being misleading, either accidentally or on purpose. So I think I need a new category of #badomics – “Omic Mimicry” or something like that …

——————————

Note – this post was spurred on by a Twitter conversation – which is captured below (note – I am certain I have complained about this before but cannot find a record of it …)

Well, this software might be useful for #metagenomics but drives me crazy when people refer to 16S PCR as #metagenomics plosone.org/article/info:d…
— Jonathan Eisen (@phylogenomics) August 22, 2012

//platform.twitter.com/widgets.js

@phylogenomics why such a strong sentiment? Do you have a good definition of the field of metagenomics that “excludes” the 16S work?
— Rob Hooft (@rwwh) August 22, 2012

//platform.twitter.com/widgets.js

@phylogenomics Yes, that would cut out all my favourite microbes.
— Sarah Watkinson (@philonotis) August 22, 2012

//platform.twitter.com/widgets.js

@rwwh metagenomics = sampling/analysis/study of the genomes from community/sample; rRNA PCR (which I love) is not about genomics
— Jonathan Eisen (@phylogenomics) August 22, 2012

//platform.twitter.com/widgets.js

@xquickfixx actually it is sadly getting more and more common …
— Jonathan Eisen (@phylogenomics) August 22, 2012

//platform.twitter.com/widgets.js

How to find an Open Access journal for submitting your paper(s) #Jane #DOAJ

Got asked a question on Twitter that seems worthwhile to post here

@phylogenomics any suggestions for appropriate open access journal for a geomicrobio/clay mineral – Cr reduction/ kinetics paper?
— Paul Glasser (@glasserp) August 21, 2012

Well, @glasserp a good place to start is “Directory of Open Access Journals” doaj.org
— Jonathan Eisen (@phylogenomics) August 21, 2012

Also @glasserp a great tool is Journal/Author Name Estimator “JANE” biosemantics.org/jane/ – w/ “extra options” you can choose access policies
— Jonathan Eisen (@phylogenomics) August 21, 2012

And @glasserp for JANE just paste in your abstract or key words and search journals and it will ID good candidates
— Jonathan Eisen (@phylogenomics) August 21, 2012

Basically what I was suggesting was two possible steps. The first is to search the Database of Open Access Journals which is a great place to browse to see what the possibilities are. Another great resource/tool is JANE – the Journal/Author Name Estimator. I love Jane and use it all the time (if interested also see the paper on Jane here). The default screen for Jane looks like this:

And you can certainly use the default options. Just type in some keywords, or copy and paste a document or abstract of a paper and select “Find Journals” and voila you get some suggested journals which match your text. So for example if I paste in “evolution genomes novelty phylogeny microbes” and search for journals I get some useful suggested journal matches

And you can also select the “show articles” option which will, well, show you some of the article matches

Also you can even export the citations, which is a nice option for adding references to various collections you might have or for looking later.

You can also look for authors or articles that match your text/keywords instead of journals. The “find authors” option is great for searching for possible reviewers if you are handling the review of a paper (or a grant).

But my favorite part of Jane is what you can do with the “Show extra options” option. This is the menu you get

This allows one search for kinds of articles as well as for kinds of access. For example, if I select “only journals with immediate access” I get a list of places I would submit papers

I am sure there are other resources out there but I particularly like these two … Any other suggestions from the world out there?

Wow – who would have thought? Microbes are central to election in Wyoming

Fascinating story from a microbiology point of view: Republican candidates disagree on water rights in Yellowstone. Seems that one of the three main candidates in this Republican primary election is focusing partly on microbes. Here are some microbial quotes from the story:

“Jennings fervently believes that the microbes found in Yellowstone National Park’s boiling waters should be working for Wyoming, generating royalties to help fund state programs. The notion has received criticism from Anderson and Radosevich.”

“Radosevich simply refuted the notion that the state should seek monetary gain from Yellowstone microbes in the first place”

“Jennings maintains that Wyoming is sitting on an “enormous bank of microbes” that have yet to be discovered. “

See more on this issue:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: