sequencing – Jonathan Eisen's Lab

Cold Spring Harbor presents the men’s only view on the evolution of sequencing

On June 5 I posted a guest blog post by an anonymous person writing about the Programming for Biology workshop at Cold Spring Harbor Labs: Guest post on Yet Another Mostly Male Meeting (YAMMM) – Programming for Biology

And this post generated some responses including yesterday a series of responses from whomever is behind the Cold Spring Harbor Meetings Twitter account.

@phylogenomics @mike_schatz We do have a role. Course instructors develop speaker lists but we work with them, especially on diversity. 1/5

— CSHL Meetings (@cshlmeetings) June 23, 2015

//platform.twitter.com/widgets.js

@phylogenomics @mike_schatz Gender, race/ethnicity, U.S. representation, and geographic region are some ways we look at diversity. 2/5

— CSHL Meetings (@cshlmeetings) June 23, 2015

//platform.twitter.com/widgets.js

@phylogenomics @mike_schatz The gender skew in the 2014 Programming course came up last year and we talked to Simon and Sofia about it. 3/5

— CSHL Meetings (@cshlmeetings) June 23, 2015

//platform.twitter.com/widgets.js

@phylogenomics @mike_schatz Simon & Sofia teach a great course that is always very well received and evaluated by its students. 4/5

— CSHL Meetings (@cshlmeetings) June 23, 2015

//platform.twitter.com/widgets.js

@phylogenomics @mike_schatz They’ll teach a great course again this year and ensure its list of guest lecturers includes more women. 5/5

— CSHL Meetings (@cshlmeetings) June 23, 2015

//platform.twitter.com/widgets.js

Sounds great. And I retweeted all of these.

And then I got an email invite to a new Cold Spring Harbor Meeting: The Evolution of Sequencing Technology: A Half Century of Progress

With a long long list of speakers. Alas, the gender ratio here of speakers is abyssmal. I have highlighted men in yellow and women in green (with the caveat that I always try to giver that assigning gender from names or appearance or records is not always accurate)

Mark Adams, J. Craig Venter Institute
Gillian Air, University of Oklahoma
Shankar Balasubramanian, University of Cambridge, UK
Hagan Bayley, Oxford Nanopore Technologies, Ltd.
David Bentley, Illumina Cambridge, Ltd
Sydney Brenner, Salk Institute for Biological Studies
Nigel Brown, University of Edinburgh, UK
George Brownlee, University of Oxford, UK
Graham Cameron, Bioinformatics Resource, Australia EMBL
Piero Carninci, RIKEN Ctr.for Life Science Technologies, Japan
Norman Dovichi, University of Notre Dame
J. William Efcavitch, Molecular Assemblies, Inc.
Miguel Garcia-Sancho, University of Edinburgh, UK
Mark Gerstein, Yale University
Jack Gilbert, University of Chicago
Walter Gilbert, Harvard University
Philip Green, University of Washington
Leroy Hood, Institute for Systems Biology
Clyde Hutchison, J. Craig Venter Institute
James Kent, University of California, Santa Cruz
Jonas Korlach, Pacific Biosciences
Victor Ling, BC Cancer Agency, Canada
David Lipman, NCBI/NLM National Instiutes of Health
James Lupski, Baylor College of Medicine
Thomas Maniatis, Columbia University Medical Center
W. Richard McCombie, Cold Spring Harbor Laboratory
Joachim Messing, Waksman Institute, Rutgers University
Gene Myers, Max Planck Institute of Molecular Cell Biology & Genetics, Germany
Richard Myers, HudsonAlpha Institute for Biotechnology
Debbie Nickerson, University of Washington
James Ostell, NLM/NCBI
Stephen Quake, Stanford University/HHMI
Charles Richardson, Harvard Medical School
Richard Roberts, New England BioLabs
Jane Rogers, The Genome Analysis Centre, UK
Mostafa Ronaghi, Illumina, Inc.
Yoshiyuki Sakaki, University of Tokyo
Jay Shendure, University of Washington
Melvin Simon, Caltech
Hamilton Smith, J. Craig Venter Institute
Lloyd Smith, University of Wisconsin-Madison
J. Craig Venter, J. Craig Venter Institute
Robert Waterston, University of Washington
James Watson, Cold Spring Harbor Laboratory
Jean Weissenbach, Genoscope, France
Barbara Wold, Caltech
Huanming Yang, Beijing Genomics Institute, China

That is right. 47 speakers. 4 of which are female. For a whopping 7.8 % female speakers. This is one of the most extreme skews I have seen for any meeting. This truly makes me sick to my stomach. Since there are plenty of women who have had and still have fundamentally important roles in the field of sequencing and sequencing technology I infer that this most likely reflects some type of bias in the meeting organization and planning process.

The meeting page lists the organizers as

Mark Adams, J. Craig Venter Institute
Nigel Brown, University of Edinburgh, UK
Mila Pollock, Cold Spring Harbor Laboratory
Robert Waterston, University of Washington

And one of the major sponsors as Illumina.

I think they all have some explaining to do.

One last note – the meeting description says “The opening session will include a tribute to Frederick Sanger, the father of DNA sequencing, and will cover the early efforts in protein, RNA and DNA sequencing.” Really? The father of DNA sequencing? Seems perfect for this meeting I guess.

UPDATE 6/29/15 7 PM PST

Apparently this meeting is part of a series on the history of molecular biology. The meeting page says

The CSHL/Genentech Center Conferences on the History of Molecular Biology & Biotechnology (http://library.cshl.edu/hosted-meetings) aim to explore important themes of discovery in the biological sciences, bringing together scientists who made many of the seminal discoveries that began the field with others whose interests may include the current status of the field, the historical progress of the field, and/or the application of these techniques and approaches in biotechnology and medicine. Previous meetings in the series have included:

Biotechnology: Past, Present & Future (2008)
History of Restriction Enzymes (2013)
Messenger RNA: From Discovery to Synthesis and Regulation in Bacteria and Eukaryotes (2014)
Plasmids: History & Biology (2014)

So I decided to take a peek at these meetings I started with Biotechnology: Past, Present & Future (2008).

Organizers

Mila Pollock
Jan Witkowski

Advisors

Sydney Brenner
Peter Feinstein
Lee Hood
Tom Maniatis
Richard Roberts

Speakers are listed below:

Garen Bohlin
Robert Bud
Don Comb
Peter Feinstein
Maryann Feldman
Herbert Heyneker
John H. Leamon
Yuk-Lam Lo
Alan McHughen
Stelios Papadopoulos
Rich Roberts
Robert Steinbrook
Kenneth Thibodeau
Marc Van Montagu
Charles Weissmann
Julie Xing

For speakers that comes to 14:2 male:female or 12.5 % female

Next I went to History of Restriction Enzymes (2013).

Organizers

Herb Boyer, University of California, San Francisco
Stu Linn, University of California, Berkeley
Mila Pollock, Cold Spring Harbor Laboratory
Richard Roberts, New England BioLabs

Speakers are listed below:

Aneel Aggarwal, Mount Sinai School of Medicine
Werner Arber, University of Basel, Switzerland
Tom Bickle, University of Basel, Switzerland
Herb Boyer, University of California, San Francisco
Jack Chirikjian, Georgetown University
Steve Halford, Bristol University, United Kingdom
Ken Horiuchi, The Rockefeller University
Clyde Hutchison, J. Craig Venter Institute
Arvydas Janulaitis, Institute of Biotechnology, Lithuania
Stu Linn, University of Califoria, Berkeley
Bill Linton, Promega
Arvydas Lubys, Institute of Biotechnology, Lithuania
Matthew Meselson, Harvard University
Rick Morgan, New England BioLabs
Andrzej Piekarowicz, Warsaw University, Poland
Alfred Pingoud, Institute of Biochemistry – Giessen, Germany
Mila Pollock, Cold Spring Harbor Laboratory
Rich Roberts, New England BioLabs
John Rosenberg, University of Pittsburgh
Ham Smith, J. Craig Venter Institute
Bruno Strasser, Yale University & University of Geneva
Geoff Wilson, New England BioLabs

OK that is 21:1 or 4.5 % women. Well, I guess this makes the meeting on sequencing look good.

So then I went to “Messenger RNA: From Discovery to Synthesis and Regulation in Bacteria and Eukaryotes (2014)“. Speakers are below:

Organizers:

James Darnell, The Rockefeller University
Adrian Krainer, Cold Spring Harbor Laboratory
Mila Pollock, Cold Spring Harbor Laboratory

Speakers

Arnold Berk, University of California, Los Angeles
Douglas Black, HHMI, University of California, Los Angeles
George Brawerman, Tufts University School of Medicine
Sydney Brenner, Janelia Farm Research Campus, HHMI
Stephen Buratowski, Harvard Medical School
Louise Chow, University of Alabama
Juan Pablo Couso, University of Sussex, UK
James Darnell, The Rockefeller University
Gideon Dreyfuss, HHMI, University of Pennsylvania
Grigorii Georgiev, Russian Academy of Sciences, Russia
Adrian Krainer, Cold Spring Harbor Laboratory
Tom Maniatis, Columbia University Medical Center
James Manley, Columbia University
Lynne Maquat, University of Rochester Medical Center
Matthew Meselson, Harvard University
Melissa Moore, University of Massachusetts Medical School
Bernard Moss, National Institute of Allergy & Infectious Diseases
Arthur Pardee, Dana Farber Cancer Institute
Mila Pollock, Cold Spring Harbor Laboratory
Rich Roberts, New England BioLabs
Robert Roeder, The Rockefeller University
Mike Rosbash, Brandeis University
Robert Schleif, John Hopkins University
Robert Singer, Albert Einstein College of Medicine
Nahum Sonenberg, McGill University, Montré, Quéc, Canada
Joan Steitz, Yale University/ HHMI
David Tollervey, Wellcome Center for Cell Biology; University of Edinburgh, UK
Jonathan Warner, Albert Einstein College of Medicine
James Watson, Cold Spring Harbor Laboratory

So so much better no? 24:5 Male: Female or 17% female (for the speakers).

Finally I checked out Plasmids: History & Biology (2014)

Organizers

Dhruba Chattoraj, National Cancer Institute, Bethesda, MD
Stanley N. Cohen, Stanford University
Stanley Falkow, Stanford University
Richard Novick, New York University
Chris Thomas, University of Birmingham, UK
Jan Witkowski, Cold Spring Harbor Laboratory, NY

Speakers

Peter Barth, Helsby, Cheshire UK
Susana Brom, Universidad Nacional Autonóma de México, Cuernavaca, Morelos Mexico
Ananda Chakrabarty, University of Illinois
Mike Chandler, Université Sabatier, Toulouse, France
Dhruba Chattoraj, National Cancer Institute, Bethesda, MD
Don Clewell, University of Michigan, Ann Arbor, MI
Stanley N. Cohen, Stanford University
Fernando de la Cruz, Universidad de Cantabria, Spain
R. Curtiss III, Arizona State University, Tempe, AZ
Julian Davies, University of British Columbia, Canada
Stanley Falkow, Stanford University
Laura Frost, University of Alberta, Edmonton, Alberta, Canada
Barbara Funnell, University of Toronto, Toronto, Ontario, Canada
Mathias Grote, Technische Universität Berlin, Germany
George A. Jacoby, Lahey Clinic, Burlington, MA
Mark Jones, Life Sciences Foundation, San Francisco, CA
Saleem Khan, University of Pittsburgh
Bruce Levin, Emory University, Atlanta, GA
John Mekalanos, Harvard Medical School
Marc van Montagu, Ghent University, Belgium
Richard Novick, New York University
David Sherratt, University of Oxford, UK
David Summers, University of Cambridge, UK
Chris Thomas, University of Birmingham, UK
Eva Top, University of Idaho, Moscow, ID
Gerhart Wagner, Uppsala University, Sweden
Michael Yarmolinsky, National Cancer Institute, Bethesda MD
Peter Young, University of York, UK

That comes to 24:4 for speakers or 14% female.

Notice any patterns? The totals for these meetings come to 17 women out of 142 speakers. Or ~12 %. That is a dismal record for Cold Spring Harbor Labs and certainly does not convince me that they are trying at all to have diversity represented at their meetings. I note – I truly love many things about CSHL. This is definitely not one of them.

UPDATE 2 – Some discussion of this post on Twitter

@nl_brown @phylogenomics wrote ‘plenty of women who have had & still have fundamentally important roles in the field of sequencing&seqtech’

— Geertje van Keulen (@DrGvanK) June 27, 2015

//platform.twitter.com/widgets.js

@DrGvanK Sadly, it is correct as the history of sequencing is male-dominated. Original phiX paper had 1/9 women authors and… (1/2)

— Nigel Brown (@nl_brown) June 27, 2015

//platform.twitter.com/widgets.js

@DrGvanK (2/2) …Gillian Air is on list. There will be 1 further replacement female/male. Not proud & could have done better on new techs

— Nigel Brown (@nl_brown) June 27, 2015

//platform.twitter.com/widgets.js

@DrGvanK Would be good if they were named and could be invited to speak.

— Nigel Brown (@nl_brown) June 27, 2015

//platform.twitter.com/widgets.js

@phylogenomics @DrGvanK Attempt was to represent the history of DNA sequencing. This is why I sought names we might have missed. (1/2)

— Nigel Brown (@nl_brown) June 27, 2015

//platform.twitter.com/widgets.js

@phylogenomics @DrGvanK (2/2) Taking diversity over actual history is both token & revisionist. Would have loved more equality in 1900s

— Nigel Brown (@nl_brown) June 27, 2015

//platform.twitter.com/widgets.js

@nl_brown @DrGvanK I don’t buy it; there are always different angles on history and technology and your appears severely skewed towards men

— Jonathan Eisen (@phylogenomics) June 27, 2015

//platform.twitter.com/widgets.js

@nl_brown @DrGvanK for example, though I love @gilbertjacka what role exactly did he have in history of DNA sequencing? (sorry Jack)

— Jonathan Eisen (@phylogenomics) June 27, 2015

//platform.twitter.com/widgets.js

@nl_brown @DrGvanK @gilbertjacka and if the meeting includes applications of sequencing I can think of dozens of women who could be there

— Jonathan Eisen (@phylogenomics) June 27, 2015

//platform.twitter.com/widgets.js

@nl_brown @DrGvanK @gilbertjacka many on speaker list who don’t do sequencing technology per se; if include those could include many others

— Jonathan Eisen (@phylogenomics) June 27, 2015

//platform.twitter.com/widgets.js UPDATE 3: Made a Storify w/ some of the discussions

//storify.com/phylogenomics/cold-spring-harbor-history-of-science-meetings-gen/embed?border=false //storify.com/phylogenomics/cold-spring-harbor-history-of-science-meetings-gen.js?border=false[View the story “Cold Spring Harbor History of Science Meetings Gender Bias” on Storify]

Oxford Global Sequencing Meetings: Where MEN Tell You About Sequencing #YAMMM

Well, got an email invite to one of these Oxford Global Meetings. Sadly the gender ratio of listed speakers is awful. I highlighted the list below (men in yellow, women in green). Ratio of 17:3. (See below). No thanks Oxford Global.

Dear Professor Jonathan Eisen ,
We hope you are well and we would like to invite you to speak at our forthcoming Next Generation Sequencing (NGS) USA congress (www.nextgenerationsequencingusa-congress.com) or co-located Single Cell Analysis USA congress (www.singlecellusa-congress.com) to be held on 27th –28th October 2015 at Harvard Medical School, Boston, USA.
Over the two days, the NGS USA congress aims to cover updates and application of NGS technologies in genomics and genetics research in the US and UK, Europe. Topics are comprised of NGS & NGS Data Analysis Technologies and Platforms, NGS for Cancer Drug Development, Microbiology and Immunotherapy as well as Clinical Applications & Diagnostics. Novel updates in Gene Synthesis, Protein Sequencing and Targeted Sequencing will also be explored The Single Cell Analysis USA congress looks at new methods in DNA sequencing, epigenomic DNA sequencing and RNA sequencing, informatics, data handling as well as application of single cell genomics in understanding cancer  other areas of cancer research such as cancer stem cells and immunotherapy. The presentations are also comprised of novel techniques in imaging and cytometry, isolation and processing of single cells. The congress also covers the applications in translational medicine and the clinic for therapeutic targeting.
The combination of carefully researched topics and high-level networking opportunities creates a unique discussion platform for over 250 senior scientists we are expecting in attendance from research institutions and pharmaceutical companies. Confirmed Speakers for 2015 include:NGS·          Sreekumar Kodangattil, Senior Principal Scientist, Pfizer
·          Shrikant M. Mane, Senior Research Scientist in Genetics; Director, MBB Keck Biotech laboratory; Director, Yale Center for Genome Analysis
·          Stephan Schuster, Professor of Biochemistry and Molecular Biology, Penn State University
·          Richard McCombie, Professor, Human Genetics, Cold Spring Harbor Laboratory
·          Jingyue Ju, Director, Center for Genome Technology and Biomolecular Engineering, Professor of Chemical Engineering and Pharmacology, Columbia University
·          Christopher Mason, Chair, ABRF NGS Consortium, Assistant Professor, Weill Cornell Medical College. Dept. of Physiology & Biophysics; The Brain & Mind Research Institute
·          Michaela Bowden, Associate Director, Center for Molecular Oncologic Pathology, Dana Farber Cancer Institute
·          Yuan Gao, Director, Associate Professor, Lieber Institute/Johns Hopkins University
·          Sheng Li, Instructor in Bioinformatics, Department of Neurological Surgery, Weill Cornell Medical College
·          Michael Fraser, Associate Director, CPC-GENE Prostate Cancer Genomics Program, Princess Margaret Cancer Centre
Single Cell·          Daniel Chiu, Professor, University of Washington
·          Steve Potter, Professor, Division of Developmental Biology, Cincinnati Children’s Medical Center
·          Norman Dovichi, Professor, University Notre Dame
·          Zaida Luthey-Schulten, Professor of Chemistry, University of Illinois
·          Paul Bohn, Professor of Chemical and Biomolecular Engineering, University of Notre Dame
·          Navin Varadarajan, Assistant Professor, University of Houston
·          Alexander R., Ivanov, Director of the HSPH Proteomics Resource, Research Scientist
Harvard School of Public Health·          Viktor Adalsteinsson, Researcher, Researcher, Koch Institute at MIT, Broad Institute of MIT and Harvard Medical School
·          Xinghua Victor Pan, Research Scientist, Single Cell Genomics Group, Sherman Weissman Laboratory, Department of Genetics, Yale University School of Medicine
·          Cheng-Zhong Zhang, Computational Biologist, Department of Medical Oncology, Dana-Farber Cancer Institute

Do small organisms form species? New paper suggests not …

Quick post here about a press release about a new paper: New mathematical theory says small organisms may not form species. Seems that this new theoretical paper might be of interest to people out there. The paper is available openly here: http://rspb.royalsocietypublishing.org/content/280/1767/20131248.abstract. Not sure quite what to make of this but thought it would be of interest. The abstract is below:

The rapid advance in genetic sequencing technologies has provided an unprecedented amount of data on the biodiversity of meiofauna. It was hoped that these data would allow the identification and counting of species, distinguished as tight clusters of similar genomes. Surprisingly, this appears not to be the case. Here, we begin a theoretical discussion of this phenomenon, drawing on an individual-based ecological model to inform our arguments. The determining factor in the emergence (or not) of distinguishable genetic clusters in the model is the product of population size with mutation rate—a measure of the adaptability of the population as a whole. This result suggests that indeed one should not expect to observe clearly distinguishable species groupings in data gathered from ultrasequencing of meiofauna.

Crosspost: Woohoo – two more genome announcement papers from our undergraduate project on built environment reference genomes

Crossposting this from the microBEnet blog.

Two new papers out from the microBEnet Undergraduate Research: Built Environment Reference Genomes project:

Coil DA, Doctor JI, Lang JM, Darling AE, Eisen JA. 2013. Draft Genome Sequence of Kocuria sp. Strain UCD-OTCP (Phylum Actinobacteria). Genome Announc. 1(3):e00172-13. doi:10.1128/genomeA.00172-13.
Diep AL, Lang JM, Darling AE, Eisen JA, Coil DA. 2013. Draft genome sequence of Dietzia sp. strain UCD-THP (phylum Actinobacteria). Genome Announc. 1(3):e00197-13. doi:10.1128/genomeA.00197-13.

These go with two previously published ones:

Lo JR, Lang JM, Darling AE, Eisen JA, Coil DA. 2013. Draft genome sequence of an actinobacterium, Brachybacterium muris strain UCD-AY4. Genome Announc. 1(2):e00086-13. doi:10.1128/genomeA.00086-13
Bendiks ZA, Lang JM, Darling AE, Eisen JA, Coil DA. 2013. Draft genome sequence of Microbacterium sp. strain UCD-TDU (phylum Actinobacteria). Genome Announc. 1(2):e00120-13. doi:10.1128/genomeA.00120-13.

And two more coming. So proud of the undergrads in my lab who did this work and David Coil for coordinating it with help from Jenna Lang and Aaron Darling. Undergrads at UC Davis sequencing genomes of organisms they isolated. So cool.

Fermentation microbiomes part 2 from #UCDavis: American coolship ale microbiome

From Nick Bokulich: This is an image of the “coolship” where the cooling wort
(pre-fermented beer) is left overnight and presumably where wild
microbes are introduced to kick off the fermentation. This is the
morning after, still full of wort.

Just a quick follow up to my recent post on How did I miss this? The botrytized wine microbiome … from #UCDavis colleague David Mills. There is a similar paper from the same group also in PLoS One from about the same time: PLOS ONE: Brewhouse-Resident Microbiota Are Responsible for Multi-Stage Fermentation of American Coolship Ale. What a job — microbes, ales and wines, and sequencing. One of the few times when reading a paper where I have said “I wish that was me doing that work.” … must look into getting involved in such studies …

Is Illumina the "duct tape" of sequencing?

Photo from Wikipedia. Photo by Evan-Amos.

For the last year or so I have become a big fan of Illumina sequencing. We are using it for everything in the lab. And many others are using it quite a lot too. All sorts of interesting applications. But of course -there are other sequencing systems that each have some advantages relative to Illumina. And one of the key limitations of Illumina sequencing has been the read length (though that limitation gets less and less as read lengths get longer and longer from Illumina machines).

The UC Davis Genome Center has had Illumina sequencing systems for many years now and we use them extensively. However, we felt for some time that we and others around town could benefit from complementary methods, especially those that could get longer reads. So we sought funding to buy other systems. And fortunately we got an NSF MRI grant to do just that -which we used to buy a Roche 454 Jr machine and contribute to the purchase of a Pacific Biosciences machine. These are good to have around because they open up new windows into sequencing – not just long reads but other areas as well. For example, the PacBio system also has the ability to use it to detect modifications to bases like methylation.

Alas, both the 454 and PacBio systems have higher error rates than the Illumina systems. And this makes some analyses challenging and limits the benefits that come from the longer reads. So what to do? For a while people have been using Illumina sequencing to “correct” the errors make by 454 and PacBio sequencing. And today Matt Herper at Forbes (For A New DNA Sequencer, A Technical Fix May Have Come Too Late – Forbes) discusses a new further improvement in the ability to do this error correction (a paper just came out on the topic from Adam Phillippy, Sergey Koren, Michael Schatz, and others).

I find this whole concept a bit funny / interesting. Not only does Illumina sequencing have many uses but one of its uses in essence helps keep aloft the potential of some of it’s competitors. In this way – Illumina can be considered the duct tape of sequencing systems. 1001 uses. Not sure the Illumina folks will be overly thrilled with this use but that is the way it goes …

(As an aside – any high throughput highly accurate sequencing method could be used in the same way as Illumina in most cases – ABI solid for example. But alas for ABI Illumina has kind of taken over this part of the market).

(An another aside – we will have to wait and see how/if the Ion Torrent systems take off in the sequencing ecosystem)

(As another aside – still waiting to see some more detail from the Oxford Nanopores folks … I would be happy to be a beta tester if anyone from Oxford is reading this).

Elaine Mardis rocks: nice talk on "Next generation sequencing"

I wish I had seen this before I gave my first lecture on Next Gen Sequencing Methods on Monday. I will post mine later but here is a really really nice talk by Elaine Mardis from Washington University on the same topic:

Wanted – opinions/details on online systems for annotation of genomes and metagenomes

Doing a little survey/snooping around. Trying to compile a list of available online tools for annotating microbial genomes and metagenomes. And I am also trying to get comments on what people think of the various tools. There are some obvious candidates to think about

IMG and IMG/M
RAST and MG-RAST
IGS annotation engine (which seems to no longer be a web server but an email the sequence to someone server).

But given that there are certainly many many more out there I decided to post a request to Twitter and Google plus and got some responses.

Jonathan Eisen ‏ @phylogenomics

Researching blog post on free/online microbial genome/metagenome annotation services – looking for examples beyond IMG & RAST

Mick Watson @BioMickWatson

@phylogenomics some microbial annotation pipelines mentioned in our review here bib.oxfordjournals.org/content/early/…

Ewan Birney ‏ @ewanbirney

@phylogenomics Check out @EBImetagenomics ebi.ac.uk/metagenomics/: ORFs, Interproscan, submissions and more.

Mick Watson ‏ @BioMickWatson

@ewanbirney @phylogenomics @EBImetagenomics needs illumina support, and support of assemblies

And from Google Plus where I asked “Researching blog post on free/online microbial genome/metagenome annotation services – looking for examples beyond IMG & RAST “:

Mary Mangan – Oh, I know some:

Manatee: http://www.jcvi.org/cms/research/projects/annotation-service/
GATU: http://athena.bioc.uvic.ca/virology-ca-tools/gatu/
Apollo: http://gmod.org/wiki/Apollo

And also from Mary

Here’s a question on that at BioStar:
EDIT: Another question suggests Artemis:

Cool paper from DerisiLab on viruses in unknown tropical febrile illnesses #metagenomics #viroarray

Quick post:

Figure 3. Circovirus-like
NI sequence coverage and phylogeny.

Cool new paper from Joe Derisi’s lab: PLoS Neglected Tropical Diseases: Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing

Full citation: Yozwiak NL, Skewes-Cox P, Stenglein MD, Balmaseda A, Harris E, et al. (2012) Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing. PLoS Negl Trop Dis 6(2): e1485. doi:10.1371/journal.pntd.0001485

They used a combination of a viral microarray and metagenomic sequencing to characterize viruses in various samples from patients with febrile illness. And they found some semi-novel viruses in the sample. Definitely worth a look.

Note – here are some other posts of mine about Derisi:

See some follow up discussion on Google+ here.

The story behind Pseudomonas syringae comparative genomics / pathogenicity paper; guest post by David Baltrus (@surt_lab)

More fun from the community. Today I am very happy to have another guest post in my “Story behind the paper” series. This one comes to us from David Baltrus, an Assistant Professor at University of Arizona. For more on David see his lab page here and his twitter feed here. David has a very nice post here about a paper on the “Dynamic evolution of pathogenicity revealed by sequencing and comparative genomics of 19 Pseudomonas syringae isolates” which was published in PLoS Pathogens in July. There is some fun/interesting stuff in the paper, including analysis of the “core” and “pan” genome of this species. Anyway – David saw my request for posts and I am very happy that he responded. Without further ado – here is his story (I note – I added a few links and Italics but otherwise he wrote the whole thing …).

—————————————

I first want to than Jonathan for giving me this opportunity. I am a big fan of “behind the science” stories, a habit I fed in grad school by reading every Perspectives (from the journal Genetics) article that I could get a hold of. Science can be rough, but I remember finding solace in stories about the false starts and triumphs of other researchers and how randomness and luck manage to figure into any discovery. If anything I hope to use this space to document this as it is fresh in my mind so that (inevitably) when the bad science days roll around I can have something to look back on. In the very least, I’m looking forward to mining this space in the future for quotes to prove just how little I truly understood about my research topics in 2011. It took a village to get this paper published, so apologies in advance to those that I fail to mention. Also want to mention this upfront, Marc Nishimura is my co-author and had a hand in every single aspect of this paper.

Joining the Dangl Lab

This project really started way back in 2006, when I interviewed for a postdoc with Jeff Dangl at UNC Chapel Hill. In grad school I had focused on understanding microbial evolution and genetics but I figured that the best use of my postdoc would be to learn and understand genomics and bioinformatics. I was just about to finish up my PhD and was lucky enough to have some choices when it came around to choosing what to do next. I actually had no clue about Dangl’s research until stumbling across one of his papers in Genetics, which gave me the impression that he was interested in bringing an evolutionary approach to studies of the plant pathogen Pseudomonas syringae. I was interested in plant pathogens because, while I wanted to study host/pathogen evolution, my grad school projects on Helicobacter pylori showed me just how much fun it is dealing with the bureaucracy of handling human pathogens. There is extensive overlap in the mechanisms of pathogenesis between plant and human pathogens, but no one really cares how many Arabidopsis plants you infect or if you dispose of them humanely (so long as the transgenes remain out of nature!). By the time I interviewed with Jeff I was leaning towards joining a different lab, but the visit to Chapel Hill went very well and by the end I was primed for Dangl’s sales pitch. This went something along the lines of “look, you can go join another lab and do excellent work that would be the same kinds of things that you did in grad school…or you can come here and be challenged by jumping into the unknown”. How can you turn that down? Jeff sold me on continuing a project started by Jeff Chang (now a PI at Oregon State), on categorizing the diversity of virulence proteins (type III effector proteins to be exact) that were translocated into hosts by the plant pathogen Pseudomonas syringae. Type III effectors are one of the main determinants of virulence in numerous gram negative plant and animal pathogens and are translocated into host cells to ultimately disrupt immune functions (I’m simplifying a lot here). Chang had already created genomic libraries and had screened through random genomic fragments of numerous P. syringae genomes to identify all of the type III effectors within 8 or so phylogenetically diverse strains. The hope was that they would find a bunch of new effectors by screening strains from different hosts. Although this method worked well for IDing potential effectors, I was under the impression that it was going to be difficult to place and verify these effectors without more genomic information. I was therefore brought in to figure out a way to sequence numerous P. syringae genomes without burning through a Scrooge McDuckian money bin worth of grant money. We had a thought that some type of grand pattern would emerge after pooling all this data but really we were taking a shot in the dark.

Tomato leaves after 10 days infection by the tomato pathogen P.syringae DC3000 (left) as well as a less virulent strain (right). Disease symptoms are dependent on a type III secretion system.

Moments of Randomness that Shape Science

When I actually started the postdoc, next generation sequencing technologies were just beginning to take off. It was becoming routine to use 454 sequencing to generate bacterial genome sequences, although Sanger sequencing was still necessary to close these genomes. Dangl had it in his mind that there had to be a way to capitalize on the developing Solexa (later Illumina) technology in order to sequence P. syringae genomes. There were a couple of strokes of luck here that conspired to make this project completely worthwhile. I arrived at UNC about a year before the UNC Genome Analysis core facility came online. Sequencing runs during the early years of this core facility were subsidized by UNC, so we were able to sequence many Illumina libraries very cheaply. This gave us the opportunity to play around with sequencing options at low cost, so we could explore parameter space and find the best sequencing strategy. This also meant that I was able to learn the ins and outs of making libraries at the same time as those working in the core facility (Piotr Mieczkowski was a tremendous resource). Secondly, I started this postdoc without knowing a lick of UNIX or perl and knew that I was going to have to learn these if I had any hope of assembling and analyzing genomes. I was very lucky to have Corbin Jones and his lab 3 floors above me in the same building to help work through my kindergarden level programming skills. Corbin was really instrumental to all of these projects as well as in keeping me sane and I doubt that these projects would have turned out anywhere near as well without him. Lastly, plant pathogens in general, and P. syringae in particular, were poised to greatly benefit from next generation sequencing in 2006. While there was ample funding to completely sequence (close) genomes for numerous human pathogens, lower funding opportunities for plant pathogens meant that we were forced to be more creative if we were going to pull of sequencing a variety of P.syringae strains. This pushed us into trying a NGS approach in the first place. I suspect that it’s no coincidence that, independently of our group, the NGS assembler Velvet was first utilized for assembling P.syringae isolates.

The Frustrations of Library Making

Through a collaboration with Elaine Mardis’s group at Washington University St. Louis, we got some initial data back that suggested it would be difficult to make sense of bacterial genomes at that time using only Illumina (the paired end kits weren’t released until later). There simply wasn’t good enough coverage of the genome to create quality assemblies with the assemblers available at this time (SSAKE and VCAKE, our own (really Will Jeck’s) take on SSAKE). Therefore we decided to try a hybrid approach, combining low coverage 454 runs (initially separate GS Flex runs with regular reads and paired ends, and later one run with long paired ends) with Illumina reads to fill in the gaps and leveraging this data to correct for any biases inherent in the different sequencing technologies. Since there was no core facility at UNC when I started making libraries, I had to travel around in order to find the necessary equipment. The closest place that I could find a machine to precisely shear DNA was Fred Dietrich’s lab at Duke. More than a handful of mornings were spent riding a TTA bus from UNC to Duke, with a cooler full of genomic DNA on dry ice (most times having to explain to the bus drivers how I wasn’t hauling anything dangerous), spending a couple of hours on Fred’s hydroshear, then returning to UNC hoping that everything worked well. There really is no feeling like spending a half a day travelling/shearing only to find out that the genomic DNA ended up the wrong size. We were actually planning to sequence one more strain of P. syringae, and already had Illumina data, but left this one out because we filled two plates of 454 sequencing and didn’t have room for a ninth strain. In the end there were two very closely related strains (P.syringae aptata or P. syringae atrofaciens) left to make libraries for and the aptata genome sheared better on the last trip than atrofaciens. If you’ve ever wondered why researchers pick certain strains to analyze, know that sometimes it just comes down to which strain worked first. Sometimes there were problems even when the DNA was processed correctly. I initially had trouble making the 454 libraries correctly in that, although I would follow the protocol exactly, I would lose the DNA somewhere before the final step. I was able to trace down the problem to using an old (I have no clue when the Dangl lab bought it, but it looked as useable as salmon sperm ever does) bottle of salmon sperm DNA during library prep. There were also a couple of times that I successfully constructed Illumina libraries only to have the sequencing runs dominated by few actual sequences. These problems ultimately stemmed from trying to use homebrew kits (I think) for constructing Illumina libraries. Once these problems were resolved, Josie Reinhardt managed to pull everything together and create a pipeline for hybrid genome assembly and we published our first hybrid genome assembly in Genome Research. At that moment it was a thrill that we could actually assemble a genome for such a low cost. It definitely wasn’t a completely sequenced genome, but it was enough to make calls about the presence or absence of genes.

Waiting for the story to Emerge

There are multiple ways to perform research. We are all taught about how important it is to define testable hypothesis and to set up appropriate experiments to falsify these educated guesses. Lately, thanks to the age of genomics, it has become easier and feasible to accumulate as much genomic data as possible and find stories within that data. We took this approach with the Pseudomonas syringae genome sequences because we knew that there was going to be a wealth of information, and it was just a matter of what to focus on. Starting my postdoc I was optimistic that our sampling scheme would allow us to test questions about how host range evolves within plant pathogens (and conversely, identify the genes that control host range) because the strains we were going to sequence were all isolated from a variety of diseased hosts. My naive viewpoint was that we were going to be able to categorize virulence genes across all these strains, compare suites of virulence genes from strains that were pathogens of different hosts, and voila…we would understand host range evolution. The more I started reading about plant pathology the more I became convinced that this approach was limited. The biggest problem is that, unlike some pathogens, P. syringae can persist in a variety of environments with strains able to survive our flourish or on a variety of hosts. Sure we had strains that were known pathogens of certain host plants, but you can’t just assume that these are the only relevant hosts. Subjective definitions are not your friend when wading into the waters of genomic comparisons.

We were quite surprised that, although type III effectors are gained and lost rapidly across P.syringae and our sequenced strains were isolated from diverse hosts, we only managed to identify a handful of new effector families. I should also mention here that Artur Romanchuk came on board and did an extensive amount of work analyzing gene repertoires across strains. A couple of nice stories did ultimately emerge by comparing gene sequences across strains and matching these up with virulence in planta (we are able to show how mutation and recombination altered two different virulence genes across strains), but my two favorite stories from this paper came about from my habit of persistently staring at genome sequences and annotations. As I said above, a major goal of this paper was to categorize the suites of a particular type of virulence gene (type III effectors) across P. syringae. I was staring at gene repertoires across strains when I noticed that two of the strains had very few of these effectors (10 or so) compared to most of the other strains (20-30). When I plotted total numbers of effectors across strains, a phylogenetic pattern arose where genomes from a subset of closely related P. syringae strains possessed lower numbers of effectors. I then got the idea to survey for other classes of virulence genes, and sure enough, strains with the lowest numbers of effectors all shared pathways for the production of well characterized toxin genes (Non ribosomal peptide synthase (NRPS) toxins are secreted out of P. syringae cells and are virulence factors, but are not translocated through the type III secretion system). One exception did arise across this handful of strains (a pea pathogen isolate from pathovar pisi) in that this strain has lost each of these conserved toxin pathways and also contain the highest number of effectors within this phylogenetic group. The relationship between effector number and toxin presence remains a correlation at the present time, but I’m excited to be able to try and figure out what this means in my own lab.

Modified Figure 3 from the paper. Strain names are listed on the left and are color coded for phylogenetic similarity. Blue boxes indicate that the virulence gene/toxin pathway is present, green indicates that the pathway is likely present but sequence was truncated or incomplete, while box indicates absence. I have circled the group II strains, which have the lowest numbers of type III effectors while also having two conserved toxin pathways (syringomycin and syringolin). Note that the Pisi strain (Ppi R6) lacks these toxin pathways.

The other story was a complete stroke of luck. P. syringae genomes are typically 6Mb (6 million base pairs) in size, but one strain that we sequenced (a cucumber pathogen) contained an extra 1Mb of sequence. Moreoever, the two largest assembled contigs from this strain were full of genes that weren’t present in any other P. syringae strain. After some similarity comparisons, I learned that there was a small bit of overlap between each of these contigs and performed PCR to confirm this. Then, as a hunch, I designed primers facing out of each end of the contig and was able to confirm that this extra 1Mb of sequence was circular in conformation and likely separate from the chromosome. I got a bit lucky here because there was a small bit (500bp or so) of sequence that was not assembled with either of these two contigs that closed the circle (a lot more and I wouldn’t have gotten the PCR to work at all). We quickly obtained 3 other closely related strains and were able to show that only a subset of strains contain this extra 1Mb and that it doesn’t appear to be directly involved in virulence on cucumber. So it turns out that a small number (2 so far) of P. syringae strains have acquired and extra 1Mb of DNA, and we don’t quite know what any of these ~700 extra genes do. There are no obvious pathways present aside from additional chromosomal maintenance genes, extra tRNAs in the same ratio as the chromosomal copies, and a couple of secretion systems. So somehow we managed to randomly pick the right strain to capture a very recent event that increased the genome size of this one strain by 15% or so. We’ve made some headway on this megaplasmid story since I started my lab, but I’ll save that for future blog posts.

Modified Figure S12 from the paper. Strains that contain the 1Mb megaplasmid (Pla7512 and Pla107) are slightly less virulent during growth in cucumber than strains lacking the megaplasmid (PlaYM8003, PlaYM7902). This growth defect is also measurable in vitro. In case you are wondering, I used blue and yellow because those were the dolors of my undergrad university, the University of Delaware.Reviewer Critiques

We finally managed to get this manuscript written up by the summer of 2010 and submitted it to PLoS Biology. I figured that (as always) it would take a bit of work to address reviewer’s critiques, but we would nonetheless be able to publish without great difficulty. I was at a conference on P. syringae at Oxford in August of 2010 when I got the reviews back and learned that our paper had gotten rejected. Everyone has stories about reviewer comments and so I’d like to share one of my own favorites thus far. I don’t think it ever gets easier to read reviews when your paper has been rejected, but I was knocked back the main critique of one reviewer:

“I realize that the investigators might not typically work in the field of bacterial genomics, but when looking at divergent strains (as opposed to resequencing to uncover SNPs among strains) it is really necessary to have complete, not draft, genomes. I realize that this might sound like a lot to ask, but if they look at comparisons of, for example, bacterial core and pan-genomes, such as the other paper on this that they cite (and numerous other examples exist), they are based on complete genome sequences. If this group does not wish to come up to the standards applied to even the most conventional bacterial genomics paper, it is their prerogative; however, they should be aware of the expectations of researchers in this field.”

So this reviewer was basically asking us to spend an extra 50k to finish the genomes for these strains before they were scientifically useful. Although I do understand the point, this paper was never about getting things perfect but about demonstrating what is possible with draft genomes. I took the part about working in the field of bacterial genomics a bit personally I have to admit, c’mon that’s harsh, but I got over that feeling by downing a few pints in Oxford with other researchers that (judging by their research and interest in NGS) also failed to grasp the importance of spending time and money to close P. syringae genomes. We managed to rewrite this paper to address most of the other reviewers critiques and finally were able to submit to PLoS Pathogens.

Baltrus DA, Nishimura TM, Reinhardt JA, Romanchuk A, Chang JH, Mukhtar MS, Cherkis K, Roach J, Grant SR, Jones CD, Dangl JL “Dynamic evolution of pathogenicity revealed by sequencing and comparative genomics of 19 Pseudomonas syringae isolates” PLoS Pathogens 7(7):e1002132

Baltrus Lab Website

Dangl Lab Website

Jones Lab Website

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: