I am highly skeptical of the CHORUS system proposed by scientific publishers as an end run around PubMed Central

Just read this news story … Scientific Publishers Offer Solution to White House’s Public Access Mandate – ScienceInsider

It reports on an effort by various scientific publishers to create something they call “CHORUS” which stands for “Clearinghouse for the Open Research of the United States.” They claim this will be used to meet the guidelines issued by the White House OSTP for making papers for which the work was supported by federal grants available for free within 12 months of being published.

This appears to be an attempt to kill databases like Pubmed Central which is where such freely available publications now are archived.  I am very skeptical of the claims made by publishers that papers that are supposed to be freely available will in fact be made freely available on their own websites.  Why you may ask am I skeptical of this?  I suggest you read my prior posts on how Nature Publishing Group continuously failed to fulfill their promises to make genome papers freely available on their website.

See for example:

We need to make sure such papers are freely available permanently and the only way to do this is via making them available outside of the publishers own sites.  Pubmed Central seems to be a good solution for this.  I would be happy to hear other possible solutions – but leaving “free” papers under the control of the publishers is a bad idea.

UPDATE 6/27/2013

Saw this Tweet

//platform.twitter.com/widgets.js Seemed potentially really interesting. Read the story and got pointed to a new Nature paper on the ancient horse genome. I guess not so surprisingly, despite the fact that they report a new genome sequence, it is not openly available. We really cannot trust Nature on this can we? They could say “Well, this is a draft genome, and we did not mean to apply our policy to draft genomes.” Well, that would be weird since, well, they have applied this to draft genomes before. And then I decided to search for other examples … and in about ten minutes I found a few. See

//platform.twitter.com/widgets.js

//platform.twitter.com/widgets.js

ICG Europe starts w/ "Omics & the future of man" & sticks to men the rest of the time

Fun.  Another day.  Another YAMMGM (yet another mostly male genomics meeting).  This one is the International Conference on Genomics Europe 2013.  I have copied the program as it is now here and then highlighted the men and women as far as I can tell.  And, well, it is not very balanced.  It starts off, ironically, with “Omics and the future of man” and then stays on both omics and alas, men, for most of the meeting.  The first woman does not talk until 5 pm on the first day.  Nothing against BGI per se.  But they seem to be repeat offenders in having meetings with mostly male speakers.  A difference between countries?  Perhaps.  But unfortunate and unpleasant nevertheless.

Sessions with speakers:

Plenary Session 1: Omics and the future of man

  • 09:00-09:10: Opening ICG-Europe 2013 & Welcome: Hans Galjaard, Chairman of the Department of Clinical Genetics at Erasmus University
  • 09:10-09:55: Talk 1: Huanming Yang, BGI, China
  • 9:55-10:25: Talk 2: Jeremy Nicholsen, Head of the Department of Surgery and Cancer, Imperial College London, UK
  • Topic: Molecular Phenotyping and Systems Medicine Approaches in Personalised and Public Healthcare

Chairman: Prof.Huanming Yang, BGI, China

Plenary Session 2 :

  • 11:00-11:30: Talk 1 (30 min): Jun Wang, CEO, BGI, China
  • 11:30-12:00: Talk 2 (30 min): Karsten Kristiansen, Head of the Department of Biology, University of Copenhagen, Denmark
  • 12:00-12:30: Talk 3 (30 min): Nils Brunner, Director of the Sino-Danish Breast Cancer Research Centre, University of Copenhagen, Denmark
  • Topic: Docetaxel resistance in vitro: Known mechanisms and novel pathways in breast cancer
  • Chairman: Prof. Jun Wang, BGI, China

Plenary Session 3: Plant and Animal Genomics

  • 13:30-13.55: Talk 1: Rajeev K. Varshney, Director-Centre of Excellence in Genomics, ICRISA, Hyderabad, India
  • Topic: “Little” is “more” for chickpea and pigeonpea
  • 13.55-14.20: Talk 2: Michael Bevan, Genomics and Functional Genomics of Bread Wheat for Crop Improvement, John Innes Centre, Norwich, UK
  • Topic: Genomics and Functional Genomics of Bread Wheat for Crop Improvement
  • 14.20-14.45: Talk 3: Michel Georges, Unit of Animal Genomics, University of Liège, Belgium
  • 14.45-15.15: Talk 4: Tomas Marques, ICREA Research Professor, Universitat Pompeu Fabra, Spain
  • Topic: Great Ape genetic diversity
  • 15.15-15.35: Talk 5: TBC
  • Chairman: Prof. Marc Van Montagu , VIB, Belgium

Session 4: Cancer genomics and Transcriptional Regulation

  • 16:00-16:20: Talk 1(20 min): Stein Aerts, Heading the Laboratory of Computational Biology, K.U.Leuven, Belgium
  • Topic: Probing into the genome, transcriptome, and regulatory network of T-cell acute lymphoblastic leukemia
  • 16:20-16:40: Talk 2(20 min): Lars Bullinger, Assistant Professor, University of Ulm, Germany
  • Topic: Genomics in acute myeloid leukemia (AML) – clinical translation of findings
  • 16:40-17:00: Talk 3(20 min): Diether Lambrechts, Assistant Professor, K.U.Leuven & VIB, Belgium
  • Topic: Mutation signatures of mismatch repair deficiency in cancer genomes
  • 17:00-17:20: Talk 4(20 min): Lynnette Fernandez-Cuesta, University of Cologne, Germany
  • Topic: Characterization of lung neuroendocrine tumors
  • 17:20-17:40: Talk 5(20 min): Henrik Ditzel, University of Southern Denmark, Denmark
  • Chairman: Dr. Jan Cools (K.U.Leuven, VIB)

Workshop:Innovation-Entrepreneurship and Venture creation-1

  • 14:30-14:50: Talk 1 (20 min): Boo Edgar, Program Director, Innovation and entrepreneurship; The Sahlgrenska Academy, University of Gothenburg
  • 14:50-15:10: Talk 2 (20 min): Martin Bonde, Chairman of Danish Biotech association
  • 15:10-15:30: Talk 3 (20 min): Søren Møller, Managing Investment Director, Novo Seeds
  • Chairman: Johan Cardoen
  • 16:00-16:20: Talk 1(20 min): Johan Cardoen, Managing Director VIB
  • 16:20-16:40: Talk 2(20 min): Patrick Van Beneden, GIMV
  • 16:40-17:00: Talk 3(20 min): Ann De Beuckelaer, Flanders Bio

Session 5: Human Disease- Structural Genomic Variation and Function

  • 09:00-09:30: Talk 1 (30 min): Wigard Kloosterman, UMC Utrecht, The Netherlands
  • Topic: Cause and Consequence of Complex Genomic Rearrangements
  • 09:30-10:00: Talk 2 (30 min): Michael Talkowski, Instructor, MGH, Harvard University, USA
  • Topic: Sequencing unique human genomes reveals novel loci in autism and predictive phenotypes in prenatal diagnostics
  • 10:00-10:30: Talk 3 (30 min): Thierry Voet, K.U.Leuven
  • Chairman: Prof. Edwin Cuppen , Hubrecht Institute

Session 6: Metagenomics

  • 09:00-09:30: Talk 1 (30 min): Hui Wang, The Centre for Ecology & Hydrology, UK
  • Topic: Virus discovery by using deep sequencing data
  • 09.30-10:00: Talk 2 (30 min): TBC
  • 10:00-10:30: Talk 3 (30 min): Bjoern Textor, New England Biolabs GmbH
  • Topic: Direct Selection of Microbiome DNA from Host DNA
  • 11:00-11:30: Talk 1 (30 min): Jeroen Raes, Scientific Collaborator, VUB&VIB
  • 11:30-12:00: Talk 2 (30 min): Rob Knight, Associate Professor, Colorado University
  • Topic: Characterizing microbial effects of family structure, including our furry family members?
  • 12:00-12:30: Talk 3 (30 min): Ruth Ley, Cornell University
  • Topic: Host control of the microbiome
  • Chairman: Dr. Jeroen Raes (VUB, VIB)

Session 7(3 talks: include Q&A 5 mins): Human Disease – Clinical Genetics

  • 11:00-11:35: Talk 1(35 min): Han Brunner, Department of Human Genetics, Radboud University Nijmegen Medical Centre, The Netherlands
  • Topic: Clinical Genetic Diagnostics by Genome Sequencing.
  • 11:35-12:05: Talk 2(30 min): Wang Wei, BGI Health, Shenzhen, China
  • Topic: Non-invasive prenatal testing (NIPT): Current clinical application and future outlook
  • 12:05-12:45: Talk 3(30 min): Gabor Vajta, BGI Europe, Copenhagen, Denmark and Central Queensland University, Rockhampton, Australia in concert with Du Yutao, BGI Health, Shenzhen, China
  • Topic: Pre-implantation Diagnostics by Blastocyst Biopsy, Vitrification and Genome Sequencing
  • Chairman: Prof. Lars Bolund, Aarhus University

Session 8: Health and Translational Medicine-1

  • 13:30-13:55: Talk 1(25 min): Vince Gao, BGI
  • Topic: Development of Clinical Service at BGI Health
  • 13.55-14:20: Talk 2(25 min): Attila Lorincz, UK
  • Topic: Clinical Validation of Genomic and Epigenomic Biomarker Panels
  • 14:20-14:45: Talk 3(25 min): Maurizio Ferrari, Director of Clinical Molecular Biology and Cytogenetics Laboratory, and Head of Genomic Unit for the Diagnosis of Human Pathologies, Center for Translational Genomics and Bioinformatics, IRCCS San Raffaele, Milan, Italian
  • Topic: From bench to bedside: new advanced molecular techniques for genetic diagnosis
  • 14:45-15:10: Talk 4(25 min): Carlos Simón Vallés, Board Certified and Full Professor of Obstetrics and Gynecology at the University of Valencia,Spain
  • Topic: Clinical Application of the endometrial receptivity array
  • 15:10-15:35: Talk 5(20 min): To be selected from submitted abstracts
  • Chairman: Dr. Vince Gao , BGI

Session 9: Human disease

  • 13:30-13:55: Talk 1(25 min): Lars Bolund, Professor of Clinical Genetics at Aarhus University, Denmark, and Adjunct Professor of Human Genetics at Copenhagen University, Denmark
  • Topic: Chronic Disorders, Rare Genetic Variants and Pig Models of Degenerative Disease Processes
  • 13:55-14:20: Talk 2(25 min): Tao Dong, Head of anti-viral T cell immunology group, MRC Human Immunology Unit, Oxford University, UK
  • 14:20-14:45: Talk 3(25 min): Hartmut Wekerle, Honorary Professor, Max Planck Institute of Neurobiology, Martinsried, Germany
  • 14:45-15:10: Talk 4(20 min): Ramneek Gupta, The Technical University of Denmark, Danmark
  • 15:10-15:30: Talk 5(20 min): Anders Børglum, Professor, Aarhus University, Denmark
  • Chairman: TBC

Session 10: Health and Translational Medicine-2

  • 16:00-16:20: Talk 1(20 min): Diana M Eccles, Academic Vice President of the Clinical Genetics Society, Southampton General Hospital, UK
  • 16:20-16:40: Talk 2(20 min): E. Gomez Garcia, Maastricht University, the Netherlands
  • 16:40-17:00: Talk 3(20 min): Pascal Pujol , Chu Montpellier, France
  • 17:00-17:20: Talk 4(20 min): Atocha Romero, Hospital Clinico San Carlos, Spain
  • 17:20-17:40: Talk 5(20 min): Ian Campbell, Peter MacCallum Cancer Centre, Australia
  • Topic: Identification and validation of familial cancer susceptibility genes using massively parallel sequencing
  • Chairman: Prof. Yves-Jean Bignon, Centre Jean Perrin

Workshop: Ethical, Legal and Social Implications (ELSI)

  • 16:00-16:20: Talk 1(20 min): Lone Frank, Denmark
  • 16:20-16:40: Talk 2(20 min): Pascal Borry, K.U.Leuven, Belgium
  • 16:40-17:00: Talk 3(20 min): TBC
  • Chairman: Prof. Huanming Yang, BGI

Session 11: Biobanks

  • 08:00-08:30: Talk 1 (30 min): Zhang Yong, BGI, China
  • 08:30-09:00: Talk 2 (30 min): Kristian Hveem, Chief Scientific Officer, Nord-Trondelag County, Norway
  • 09:00-09:30: Talk 3 (30 min): Shaoliang Peng, National University of Defense Technology, China
  • Topic: Bioinformatics and Computational Biology on TianHe Supercomputer
  • Chairman: Dr. Zhang Yong, BGI

Workshop: Use of Omics Technology for Personalized Medicine

  • 08:00-08:30: Talk 1 (30 min): Jenny Wei, R&D Information China, AstraZeneca global R&D
  • Topic: Genomics for Personalized Medicine: From Discovery to Clinic
  • 08:30-09:00:Talk 2 (30 min):André Rosenthal, CEO, Signature Diagnostics AG
  • Topic: Next-Gen Sequencing Tests for Prognosis and Prediction of Response to Therapy of Patients with Colorectal Cancer Using Somatic Mutation Signatures
  • 09:00-09:30: Talk 3 (30 min):Radoje Drmanac, Complete Genomics, Inc. Mountain View, California, U.S.A.
  • Topic: Accurate whole genome sequencing as the ultimate genetic test enabling personalized disease prevention and treatment
  • Chairman: TBC

Session 12: Bioinformatics

  • 10:00-10:30: Talk 1 (30 min): Nathaniel Street, Assistant professor, Umea University
  • Topic: Sequencing the Norway spruce genome reveals a unique history of repeat expansion
  • 10:30-11:00: Talk 2 (30 min): Sofie Van Landeghem, Ghent University, VIB, Belgium
  • Topic: Mining the literature to enhance integrative network biology
  • 11:00-11:30: Talk 3 (30 min): Mario Caccamo, Acting Director at The Genome Analysis Centre, Norwich, UK
  • Topic: Next Generation Genomics for Complex Crops
  • Chairman: Prof. Yves Van De Peer (U.Ghent, VIB)
For related posts see

Twisted tree of life award #15: NBC News on "Junk DNA mystery"

Oh for fu$*# sake.  Really MSNBC?  I mean, I know perhaps I should not expect much from some in the press but this is just awful: ‘Junk’ DNA mystery solved: It’s not needed.

Brought to us by NBC News and LiveScience (which actually can have some pretty good science coverage).  This article has some complete and utter crap:

Some parts that I have issues with:

  • The headline: “‘Junk’ DNA mystery solved: It’s not needed.”  The headline is silly but alas it is consistent with what is in the article.
  • So-called junk DNA, the vast majority of the genome that doesn’t code for proteins“.  So – they have redefined junk DNA as all non coding DNA?
  • “For decades, scientists have known that the vast majority of the genome is made up of DNA that doesn’t seem to contain genes or turn genes on or off.”  Apparently there is an entity out there known as “The Genome”.  
And then we get into the quoting of author and researcher Victor Albert with no comments or responses from anyone is painful too.
  • At least for a plant, junk DNA really is just junk — it’s not required.”  Except that they did not show this – they just showed that one plant can have a small genome and not have a lot of “junk” as they call it, which of course does not really say anything about what “junk” does or does not do in other organisms.
  • Nobody’s really known what junk DNA does or doesn’t do” apparently calling into question the some 10,000 plus papers on the topic.

Apparently, from reading the rest the whole point of this article is that it turns out that people sequenced the genome of a bladderwort and it has a small genome but a lot of genes.  Oh and the organism is complex.  Therefore, apparently, it follows that

“The findings suggest junk DNA really isn’t needed for healthy plants — and that may also hold for other organisms, such as humans.”

And this leads us to ‘Junk’ DNA mystery solved: It’s not needed.

So – basically – if ONE FUCKING ORGANISM DELETES SOME OF IT’S NON PROTEIN CODING PORTIONS OF ITS GENOME THEN THIS MEANS THAT ALL NON CODING DNA IS USELESS.

Aaaaaaaaaaaaaaaaaaargh.

And for this evolutionary logic, I am awarding NBC News, Tia Ghose (the author of the piece) and Victor Albert, the 15th coveted Twisted Tree of Life Award.

Past winners:
UPDATE 5/17/13
Some other discussions of this paper and related to my critique (though not always agreeing with me)

HeLa genome sequenced w/o obtaining permission/consent from family – some comments and background

Last week David Coil in my lab reminded me that he had been wanting to borrow a copy of “The Immortal Life of Henrietta Lacks” by Rebecca Skloot.  I have read the book many many times and had told David I even had a preprint that Skloot or her publicist sent me before the book came out (I did not know Skloot then – I just got it because of my blog).  As I went to grab the preprint off my shelf in my office he said he wanted to read it know because the genome of the HeLa cells which had been taken from Mrs. Lacks had been published a few days before.  I was shocked.  I asked him if he knew if the authors of said paper had gotten consent before publishing it.  So I opened a web browser and googled and found the paper and some news stories and a press release from the group who did the sequencing.

Holy fuck.  They did not seem to have permission.  Uggh.  I had thought about this a lot because a few years ago I was thinking of writing a review of “The Immortal Life of Henrietta Lacks”. As part of that started to write about the possibility of sequencing the HeLa genome and what that might mean.  I also did an April Fools joke relating to the topic: http://therealhela.blogspot.com.  And every time new sequencing technology comes along I have thought about – and discussed with others – the possibility of sequencing the HeLa genome.  And every time I got to this point I decided that it would be unethical, inappropriate, and downright stupid to do this without consent.  Note – my original plans for the book review involved a focus on the strange balance between openness and sharing in the history of HeLa and the lack of consent (e.g., see this blog post).

I was so angry about the lack of consent here that I took to Twitter.

//platform.twitter.com/widgets.js

//platform.twitter.com/widgets.js
And after that there was remarkably little discussion of the issue by others. What the fuck? People get up in arms about all sorts of minor things so why not get up in arms about this? Where were all the supposed genomic ethicists out there? How did this happen? Thankfully, yesterday a piece on the topic came out from Rebecca Skloot (it was in this mornings New York Times) and it has launched this issue into a much more public discussion. So much discussion that I decided to storify it. See below.

//storify.com/phylogenomics/hela-genome-sequenced-w-o-consent.js[View the story “#HeLa genome sequenced w/o consent (by Jonathan Eisen)” on Storify]

Lots of discussions going on out there. And I think Rebecca deserves credit for writing this piece and bringing the story out more. I tried to get people going on Twitter and it was a slog — people did not seem that interested to be honest. Now – everyone seems interested. Including some who say they agree with Rebecca (and me) that it was a mistake to publish this genome.

Alas, am wondering what these people thought before the Skloot article. Why did so many people just stand by and say nothing? Too busy? Did not occur to them that this could be an issue? Or something else.  Oh – and why did it not occur to Francis Collins and all the people behind encode that this could be an issue. They published a lot of genomic data from HeLa cells and never once asked for consent or apparently even thought about it.

Anyway – it’s about time we as a community got off our butts and started discussion how to deal with the ethics of personal genome data.  This data will be coming out more and more.  We need to figure out how to handle it and the consent issues around it.  And we also need to do a better job of figuring out what to do with samples for which consent was not given but which are used.  Should we stop using HeLa cells?  Possibly.  If we want consent to use them – who will give it?  I don’t know the answers.  But I do know one thing – science should not simply proceed forward just because these questions are hard to answer.  Publishing the genome without consent or talking to the family was a very very very bad idea given that the ethical issues around consent here are murky.


UPDATE – 5 PM 3/24/13

Adding some notes about the press release and genome publication
Genome paper: – some key quotes of interest
  • Abstract
    • “To date, no genomic reference for this cell line has been released, and experiments have relied on the human reference genome”
    • “Our results provide the first detailed account of genomic variants in the HeLa genome, yielding insight into their impact on gene expression and cellular function as well as their origins.”
  • Results
    • “produced nearly 1 billion reads of length 101 nt” (thus they produced 101 billion bases of DNA sequence information).
    • The read data are available in the European Nucleotide Archive (ENA) database under the accession number ERP001427. 
    • We report a compendium of genomic variation (CN, SNVs and SVs) as well as the first HeLa genome draft, which are available as VCF and FASTA files respectively 
    • We provide a tool to perform the translation of coordinates between GRch37 and our HeLa reference, 
    • Most variants in these HeLa cells thus represent common variants in the human population. The African-American population (to which Henrietta Lacks belonged) is spread between the African and European clusters, with the HeLa sample overlapping both. This demonstrates that although the genomic landscape of HeLa is strikingly different from that of a normal human cell, the population-specific SNV patterns are still detectable. 
  • Discussion
    • Since the establishment of the HeLa cell line in 1952, it has been used as a model for numerous aspects of human biology with only minimal knowledge of its genomic properties. Here we provide the first detailed characterization of the genomic landscape of one HeLa line relative to the human reference genome 
Original press release (a copy of which I found here)
  • “The results provide the first detailed sequence of a HeLa genome,” explain Jonathan Landry and Paul Pyl from EMBL, who carried out the research. “It demonstrates how genetically complex HeLa is compared to normal human tissue. Yet, possibly because of this complexity, no one had systematically sequenced the genome, until now.”
  • “The HeLa genome had never been sequenced before, and modern molecular genetic studies using HeLa cells are typically designed and analysed using the Human Genome Project reference. This, however, misrepresents the sequence chaos that characterises HeLa cells, since they were derived from a cervical tumour and have since been adapting in laboratories for decades.”
  • “The study provides a high-resolution genetic picture of a key research tool for human biology. It highlights the extensive differences that cell lines can have from the human reference, indicating that such characterisation is importahttp://www.nytimes.com/2013/03/24/opinion/sunday/the-immortal-life-of-henrietta-lacks-the-sequel.html?_r=0nt for all research involving cell lines and could improve the insights they deliver into human biology.”
  • Can we infer anything about Henrietta Lacks or her descendants from this sequencing?
    • No, we cannot infer anything about Henrietta Lacks’ genome, or of her descendants, from the data generated in this study. Firstly, the subtype of HeLa cells sequenced in this study has spent decades in labs, dividing and thus undergoing mutations and changes – they are very different from the original cells that started growing in 1951. Secondly, these initial HeLa cells were taken from Henrietta Lacks’ cervical cancer tumour – as cancer is a disease of the genome, the DNA of cancer cells is usually different to that of the patient. Without any genetic information from the http://www.genomeweb.com/blog/learnt-lessonsoriginal tumour or from Henrietta Lacks, it is impossible to distinguish which parts of the genome sequenced here originate from Mrs. Lacks, her tumour, or laboratory adaptation. The goal of this study was not to gain insights into Henrietta Lacks’ cancer or personal biology, but rather to provide a resource for researchers using HeLa cells.

UPDATE 3: 11: 40 PM 3/25/13 Presidential Commission

Rebecca Skloot has unearthed a report from the Presidential Commission on for the Study of Bioethical Issues which few people seem to have been aware of (I have heard nothing about it). 

//platform.twitter.com/widgets.js
The report was release on October 2012 but got very very little coverage and I have never seen/heard it mentioned anywhere. But it covers a lot of ground of direct relevance to this HeLa story. The whole report is available here. Here are some choice statements (bolding by me)

“Large-scale collections of genomic data raise serious concerns for the indi- viduals participating. One of the greatest of these concerns centers around privacy: whether and how personal, sensitive, or intimate knowledge and use of that knowledge about an individual can be limited or restricted (by means that include guarantees of confidentiality, anonymity, or secure data protec- tion). Because whole genome sequence data provide important insights into the medical and related life prospects of individuals as well as their relatives who most likely did not consent to the sequencing procedure—these privacy concerns extend beyond those of the individual participating in whole genome sequencing. These concerns are compounded by the fact that whole genome sequence data gathered now may well reveal important information, entirely unanticipated and unplanned for, only after years of scientific progress.”

“Whole genome sequencing dramatically raises the privacy stakes because it necessarily involves examining and sharing large amounts of biological and medical information that is not only inherently unique to a single person but also has implications for blood relatives. Genomic information is inherited and determines traits like hair and eye color. Unlike a decision to share our hair or eye color, which does not reveal anything about our relatives that is not observable, a decision to learn about our own genomic makeup might inadvertently tell us something about our relatives or tell them something about their own genomic makeup that they did not already know and perhaps do not want to know. More than other medical information, such as X-rays, our genomes reveal something both objectively more comprehensive and subjectively (to many minds) more fundamental about who we are, where we came from, and the health twists and turns that life might have in store for us.”

Because whole genome sequence information directly implicates relatives, psychological harms often are not limited to the person whose genome is voluntarily being sequenced and publicly disclosed. Even individuals who learn that they do not carry a harmful variant may experience “survivor’s guilt” if another family member is affected.”

“At the same time, individuals have a responsibility to safeguard their privacy as well as that of others, by giving thoughtful consideration to how sharing their whole genome sequencing data in a public forum might expose them to unwanted incursions upon their privacy and that of their immediate relatives. To be indifferent to the implica- tions of disclosure of sensitive data and information about one’s self is to act irresponsibly. That being said, it can be good and virtuous to share sensitive data about oneself in appropriate circumstances, for example, for the good of public health research or public education.”

Risks might also fall to blood relatives of these individuals who carry similar genomic variants, thereby raising the stakes of privacy concerns in whole genome sequencing compared with most other types of research.”


UPDATE 4: 3/26/13 – Some new stories / links


UPDATE 5: 3/26/13 – Rebecca Skloot on Morning Edition


UPDATE 6: 3/26/13 – Some more stories / discussion


UPDATE 7: 3/26/13 2 PM PST Still waiting for ENCODE to say something about whether they are going to take down their #HeLa data. See for example my Tweet from a few days ago

//platform.twitter.com/widgets.js


UPDATE 3/27

UPDATE 3/28
UPDATE 3/29

Gordon and Betty Moore Foundation hiring fellow for Marine Microbiology program #bioinformatics

Interesting Job Opportunity: Program Fellow, Marine Microbiology Initiative – Gordon and Betty Moore Foundation

See key details of the ad below:

The Bioinformatics Fellow position will be a 1-2 year term. 
The Program Fellow will: 
  • Contribute to developing strategy and implementation plans for the bioinformatics portfolio within the Marine Microbiology Initiative.  The fellow will prepare needs assessment for cyberinfrastruture to support research and discovery by marine microbial ecologists.  The fellow will also coordinate bioinformatics-related activities within MMI. (60% time effort)  
  • Help convene, facilitate and participate in meetings about cyberinfrastructure related to the MMI community to gather and disseminate knowledge, and produce meeting reports or white papers. (30% effort)  
  • Collaborate with MMI Program Officers on grants management related to bioinformatics and data management. (10% effort)  

Key Responsibilities  
The Program Fellow will: 
  • Help develop a strategy and the implementation plans for cyberinfrastructures related to MMI activities. 
  • Communicate with the research community, other funders, commercial vendors, and others to prepare a needs analysis for cyberinfrastructure that includes a description of ongoing or past activities and existing infrastructure. 
  • Convene meetings and workshops in cooperation with grantees and other funders as necessary. 
  • Maintain solid knowledge of the field and key emerging trends.  
  • Contribute effectively on a variety of Program- and Foundation-wide issues beyond the Initiative as required. 
Experience and Education  
The candidate will have: 
  • A Doctorate degree in environmental microbiology, bioinformatics, biology or other relevant field.   
  • Demonstrated knowledge and/or experience with computing environments and sequencing technologies.   
  • Demonstrated experience with using bioinformatics tools.   
Competencies and Attributes  
The ideal candidate also will have:  
  • Good communications skills including demonstrated writing skills.  
  • Demonstrated knowledge of the bioinformatics community and/or existing cyberinfrastructure that supports environmental science.    
  • A desire to promote and work on a complex partnership and multi-stakeholder project to achieve tangible outcomes.  
  • Ability to synthesize diverse points of view to develop solutions. 
  • Demonstrated strong teamwork and interpersonal skills, with ability to develop productive relationships with colleagues, grantees, and stakeholders. Collegial and energetic working style.   
  • Demonstrated comfort with and experience in public speaking and meeting organization/facilitation.    
  • Demonstrated ability and openness to quickly adapt and adjust strategy and approach to changing conditions. 
  • Personal motivation to support the Foundation mission and goals.   
  • Ability and interest in traveling to grantee meetings, site visits, and national/international conferences.   

One way to keep up with new genome sequence publications – SIGS compilation

This is a very very helpful thing to keep up with new genome sequence releases/publications: Genome sequences published outside of Standards in Genomic Sciences, October-mid November 2012 | Nelson | Standards in Genomic Sciences.  From Oranmlyan Nelson and George Garrity in the SIGS Journal.  It is a bit mind boggling how many genome sequences are being determined and published.  Fun.  But mind boggling.  Anyway – good to have someone trying to keep track.  Also see GOLD: Genomes OnLine Database.

Reading in detail Carl Woese’s 1998 "Manifesto on Microbial Genomics" for the first time …

I am a bit stunned by this paper from Carl Woese in 1998 which I was aware of but have not read in detail until now: ScienceDirect.com – Current Biology – A manifesto for microbial genomics

I re-discovered it because I am making a compilation of papers by Woese in relation to the tribute page I have set up.  And the title (a manifesto about microbial genomics) combined with the date (1998 – early in the genome sequencing era) struck me as something worth looking at.  Plus I knew others (e.g., Phil Hugenholtz, Nikos Kyrpides, …) had mentioned this paper to me so I figured – hey – how about actually reading it in detail.  And fortunately it is freely available at the Current Biology web site (not sure why that is actually).  Anyway – what I found in the paper is basically an argument for much of my career from 1998-2008.

Some choice lines in here but the crux is as follows

The first order of business in microbial genomics should be a phylogenetically representative genomic screen of the microbial world. In other words, all the major microbial taxa and their subdivisions — which are the major source of biological diversity on Earth — should be represented by several genome sequences. There are now more than 30 recognized major eubacterial taxa — each the phylogenetic equivalent of a eukaryotic kingdom — and at least half that number in the (far less well characterized) Archaea; not to mention the yet-to-be-discovered kingdoms among the unicellular eukaryotes.

This basically lays out the Tree of Life project I co-ran at TIGR and the Genomic Encyclopedia of Bacteria and Archaea project I co-ran / run at the DOE JGI.

The ending is perfect

This is not the place to go into the specifies of which microbial genomes would be most useful. I would suggest, however, that a phylogenetic tree hang on the wall of every laboratory in which microbial genomes are being sequenced — for inspiration.

Somehow I had missed the crux of this paper until now.  I think it is worth reading by everyone out there working on microbes and/or their genomes.

Oh – and here is the compilation of Woese’s papers I am making in Mendeley.

http://www.mendeley.com/groups/2940711/papers-by-carl-woese/widget/21/3/

Story behind the paper: Corey Nislow on Haloferax Chromatin and eLife

This is fun.  Today I am posting this guest post from Corey Nislow in my continuing “Story behind the paper” series.  The history of this post is what is most fun for me.  A few weeks ago I received this email from Corey:

Hi Jonathan, I hope this mail finds you well.
I wanted to alert you to a study from our lab that will be coming out in the inaugural issue of eLIFE.
After reading your PLoS ONE paper on the Haloferax volcanii genome (inspiration #1) I ordered the critter, prepared nucleosomes and RNA and we went mapping. Without a student to burden, I actually had to do some work…
Anyhow, we found that the genome-wide pattern of nucleosome occupancy and its relation to gene expression was remarkably yeast like. Unsure of where to send the story, we rolled the dice with the new open access journal eLIFE (inspiration #2) and the experience was awesome. I’m quite keen to pursue generating a barcoded deletion set for Hfx.
here’s the paper (coming out Dec. 10) if you’re curious.

And a PDF of the paper was attached.

And I wrote back quickly in my typically elegant manner:

completely awesome

But then I thought better of it and wrote again

So – can I con you into writing a guest post for my blog about the story behind this paper?  Or if you are writing a description somewhere else I would love to share it

And he said, well, yes.  And with a little back and forth, he wrote up the post that it below.  Go halophiles.  Go Haloferax.  Go open access.  Go science.


Chromatin is an ancient innovation conserved between Archaea and Eukarya  – The story behind the story
By Corey Nislow

My group first became interested in understanding the global organization of chromatin in early 2005 when Lars Steinmetz (now program leader at the EMBL) led a team effort at the Stanford Genome Center to design a state-of-the-art whole genome tiling microarray for Saccharomyces cerevisiae. These were heady times at Ron Davis’ Genome Technology shop and the array was another triumph of technology and teamwork. The array has over 7 million exceedingly small (5 µm²). The history of how this microarray transformed our understanding of the transcriptome began in 2006. As Lars’ group dug deeper, the extent of antisense transcription and its role in the regulation of expression became clear.

The availability of this array and its potential for asking interesting questions inspired me to convince William Lee, a new graduate student in my group (now at Memorial Sloan-Kettering) to embark on a seemingly simple experiment. The idea was to ask if we could use the classic micrococcal nuclease assay to define nucleosome positioning on a DNA template. But rather than using a short stretch of DNA that could be assessed by radioactive end-labeling and slab gel analysis, we decided the time was right to go “full-genome”. Accordingly, the template was all ~12.5mB of the yeast genome. Will systematically worked out conditions appropriate for hybridization, wrote the software to extract signal off the array (we were flying blind as the array did not come with an instruction manual) and producing an output that was compatible with the genome browsers of the time. Will’s computational background proved critical here (and at several later stages of the project). The result of this experiment was a map of the yeast genome with each of its approximately 70,000 nucleosome’s charted with respect to their occupancy (the length of time that the nucleosomes spend in contact with the DNA) and positioning (the location of a particular nucleosome relative to specific sequence coordinates) in a logarithmically growing population of cells (the paper). Both occupancy and positioning regulate access of most trans-acting factors for all DNA transactions. Working with my new colleague Tim Hughes at the University of Toronto, we began to mine this data focusing first on how the diverse occupancy patterns correlated with aspects of transcription, e.g. the presence of transcription factor binding sites, the level of expression of particular genes, and the like. With this data for the entire genome, we could systematically correlate nucleosome positioning/occupancy with functional elements, sequence logos and structural features. Des Tillo, a graduate student in Tim’s lab and now a research fellow with Eran Segal, was able to build a model that could predict nucleosome occupancy. The correlation (R=0.45) was not great but it was miles better than anything that existed at the time. Tim and Eran’s labs, work with Jason Lieb and Jonathan Widom, refined the model to greater accuracy 2009 model.

Our original study (essentially a control experiment to define the benchmark nucleosome map in yeast) has been widely cited- many of these cites have come from what were two opposing camps, the sequence advocates and the trans-acting proponents. The sequence folks posed that nucleosome position is directed by the underlying sequence information while the trans-acting folks see chromatin remodelers as having the primary role. Having last worked on chromatin in 1995 as a postdoc in Lorraine Pillus’ lab (cloning yeast SET1), it has been a scientific treat to be both a participant and observer in this most recent renaissance of chromatin glory.

The protocol

As a reminder, the micrococcal nuclease (MNase) assay relies on the preference of this nuclease to digest linker DNA. By chemically crosslinking histones to DNA with formaldehyde, digesting with MNase, then reversing the crosslinks and deproteinizing the DNA, you obtain 2 populations of DNAs, those protected by digestion (and presumably wrapped around nucleosomes in vivo) and a control sample that is crosslinked but not digested (genomic DNA). The former sample becomes the numerator and the latter the denominator and you take the ration between the two. Initially we compared the microarray signal intensities, now next generation sequence counts are used to define nucleosomal DNA. This cartoon depicts the array based assay, but simply swap in an NGS library step for the arrays to upgrade to the current state-of-the-art.  

In 2007 we were restricted to array-based assays (as were most genomic studies) and frankly, the 4bp resolution of the arrays was pretty amazing. But the introduction of Next-generation sequencing opened up the possibility of charting nucleosomes in worms or wildebeest or almonds, there was nothing to stop you other than the short read lengths at the time. The read length issue has since disappeared as the “short-read” platforms can easily cover the length of a nucleosome protected DNA fragment of ~150bases.

So that brings me to the paper I’d like to highlight today, which asks the question: if (and how) chromatin is organized in the archae, and further, is there any correlation of archae chromatin architecture to gene expression?

My extreme background
Just like the universal fascination of kids with dinosaurs, I was captivated by the discovery of life in extreme environments like boiling water or in acid that could melt flesh on contact. Teaching intro bio, I would try to provoke the students by claiming that discovering extraterrestrial life will be a letdown compared to what we can find on earth. So while my students were occupied with classifying yeast nucleosome and transcriptome profiles in different mutants and drug conditions, I had the rare opportunity to indulge my curiosity. Jonathan E’s talks on the dearth of information on microbes, combined with my re-discovery of the early papers from Reeve and Sandman (see review) had me hooked. Reading the literature was like discovering the existence of a parallel chromatin universe. Archae histone complexes were tetramers (as opposed to the octamers of eukaryotic nucleosome core particles) but most everything else was similar- they wrapped DNA (60-80 bases compared to 147 for yeast) and although archael histones did not share primary sequence similarity to eukaryotic nucleosomes, at the structural level they resembled histone H3 and H4 in eukaryotes.

Working from ignorance
Choosing the particular archaeon to study was dictated by one criterion, the ability to grow it in the lab easily without resorting to anaerobic conditions or similar calisthenics. Again, I was fortunate in that the halophilic arcaeon Haloferax volcanii fit the bill, but more importantly, there was a wealth of literature on this critter, including a well-annoted genome (thanks again Jonathan!) and an impressive armamentarium of genomic tools. Indeed the work of Allers, Mevarech and Lloyd and others have established Hfx. volcanii as a bona fide model organism with excellent transformation gene deletion gene tagging and gene expression tools.


Home for Haloferax volcanii


This photograph shows salt pillars that form in the dead sea which borders Jordan to the east and Israel and the West Bank to the west. The salt concentration in the water can exceed 5M!

So cool, now all we had to do was prepare nucleosomal DNA and RNA from Haloferax, sequence the samples, build a map and see where it led us. With everyone in the lab otherwise occupied, I tried to grow these critters. At first I was convinced I’d been out of the lab too long as nothing grew. Actually I just needed to be a little patient. Then the first cell pellets were so snotty that I aspirated them into oblivion. Finally, I had plenty of pellets and my talented yeast nucleosome group adapted their protocols such that we got nice nucleosome ladders.

This was a pleasant surprise and one we did not take for granted given the high CG content of the genome (65%). We then turned to isolating RNA. Without polyA tails for enrichment, our first attempts at RNA-seq were 95% ribosomal. Combining partially successful double-stranded nuclease (DSN) treatment with massive sequencing depth we were able to get fairly high coverage of the transcriptome. Here’s where Ron Ammar, a graduate student supervised by me, Guri Giaever and Gary Bader stepped in and turned my laboratory adventures into a wonderful story. Ron mapped the reads from our nucleosome samples to the reference genome and found what to my eyes looked like a yeast nucleosome map only at half scale.

Here were well-ordered arrays in the gene bodies and nucleosome depleted regions at the ends of genes. The Haloferax genome is a model of streamlining and as a consequence, intergenic regions are tiny and hard to define. With little published data to guide the definition of archea promoters and terminators the transcriptome map saved us. Ron focused on the primary chromosome in Haloferax and hand curated each transcription start and stop site based on the RNA-seq data. This is when we realized we had something interesting. Here were nucleosome depleted promoters and nucleosome depleted terminators and when we constructed an average-o-gram of all the nucleosome signatures for each promoter on the main chromosome, it looked like this….

The take home

The data strongly suggested that archae chromatin is organized in a matter very similar to eukaryotes. And further, the correlation between gene expression and nucleosome positioning, particularly with respect to the +1 and -1 nucleosomes was conserved. This conservation begs some interesting speculation. According to Koonin and colleagues the common ancestor of eukaryotes and archea predates the evolutionary split that gave rise to euryarchael and crenarchael lineages. Both of these branches have bona fide nucleosomes, therefore it would seem parsimonious to assume that the ancestor of these two branches also organized its genome into chromatin with anucleosomal scaffold. The similarities between the chomatin in archaea and eukaryotes, and the correlation between nucleosome occupancy and gene expression in archaea raise the interesting evolutionary possibility that the initial function of nucleosomes and chromatin formation might have been to regulate gene expression rather than for packaging of DNA. This is consistent with two decades of research that has shown that there is an extraordinarily complex relationship between the structure of chromatin and the process of gene expression. It also jives with in vitro observations that yeast H3/H4 tetramers can support robust transcription, while H2A/H2B tetramers cannot.

It is possible, therefore, that as the first eukaryotes evolved, nucleosomes and chromatin started to further compact their DNA into nuclei, which among other things, helped to prevent DNA damage, and that this subsequently enabled early eukaryotes to flourish. This observation is so exciting to me because it brings up so many questions that we can actually address such as- if there are nucleosomes comprised of histones, where are the histone chaperones? And further- despite the conventional wisdom that archael nucleosomes are not post translationally modified- this remains to be confirmed (or denied) experimentally. If conventional wisdom is correct and archea histones are not post countries post-translational and modified, then when did this innovation arise? There are more than enough questions to keep the lab buzzing!

Publishing the paper
Because I truly believed that this result “would be of general interest to a broad readership” we prepared a report for Science which was returned to us within 48 hours. The turnaround from Nature was even faster. I had received emails from eLIFE several months previously, and after reading the promotional materials and the surrounding press, we took our chances s at eLIFE and hoped for the best. The best is exactly what we got. Within a few days the editors emailed that the manuscript was out for peer review and four weeks later we received the reviews. They were unique. They outlined required, non-negotiable revisions (including a complete resequencing of the genome after MNase digestion but without prior cross-linking) but contained no gray areas and required no mind-reading. With all hands on deck and we resubmitted the manuscript in four weeks and were overjoyed with its acceptance. Of course with N=1, combined with a positive outcome it’s hard to be anything but extremely positive about this new journal. But I think the optimism is defendable- the reviews were transparent, and the criticisms made it a better paper. The editorial staff was supportive gave us the opportunity to take the first stab at drafting the digest which accompanies the manuscript.

NOTE ADDED BY JONATHAN EISEN.  A preprint of the paper is available here.  Thanks to the eLife staff for helping us out with this and encouraging posting prior to formally going live on the eLife site.

What’s next and what’s in the freezer
This work represents the Haloferax reference condition, with asynchronously growing cells in rich, high-salt media. We recently collected samples of log phase cultures exposed to several environmental stresses and samples from lag, log and stationary phases of growth to chart archael nucleosome dynamics. We are also refining a home-made ribosomal depletion protocol to make constructing complementary transcriptome maps considerably cheaper. Finally, it is exciting to contemplate a consortium effort to create a systematic, barcoded set of Haloferax deletion (or disruption) mutants for systematic functional studies.

Mille grazie to Jonathan E. for inspiring me to looking at understudied microbes and for encouraging me to walk the walk with respect to publishing in open access forums. And for letting me share my thoughts as a guest on his blog

The tree of life from Haloferax’s perspective Artwork by Trine Giaever

Get the genomes of up to 12 type strains of bacteria and/or archaea sequenced, for free

Barny Whitman asked me to post this announcement and, well, I am.  I made one edit below (see strikethrough) in honor of Norm Pace.

Genomic Sequencing of Prokaryotic Bacterial and Archaeal Type Strains

The Community Sequencing Program (CSP) Quarterly Microbial call of the DOE Joint Genomes Institute provides a great opportunity to obtain draft genomic sequences of the type strains of bacterial and archaeal species. The type strains may also include proposed species prior to publication. Type strains must be relevant to DOE mission areas, such as bioenergy, biogeochemistry, bioremediation, carbon cycling, and phylogenetic diversity. However, strains of human pathogens and human associated species are not eligible. Proposals for genome sequencing of type strains can be submitted through the CSP Quarterly Microbial call, whose deadline is December 17, 2012, with approval usually being completed within one month. Up to 12 strains can be included in each proposal. Proposals for larger numbers of strains need to be submitted to the CSP annual call in the spring. If you cannot make the December call, Quarterly calls are also scheduled for March 25, June 17, and September 23, 2013.

Proposals may be completed on-line at: http://proposals.jgi-psf.org/proposals. You will need to register and sign in to this server. Once on the server, follow the links to the “CSP Quarterly Microbial/Metagenome”. All strains will have to have been deposited in a culture collection, including proposed type strains prior to publication. If a culture collection ID is not available, you can attach a copy of the Certification of Availability. Once approved, you will need to provide 5-10 µg of high molecular weight DNA.

For questions, contact Barny Whitman, University of Georgia (whitman@uga.edu).

Convoluted title, cool paper in #PLoSGenetics on relative of insect mutualists causing a human infection

Saw this tweet a few minutes ago:

//platform.twitter.com/widgets.js
The title of the paper took me a reread or two to understand.  But once I got what they were trying to say I was intrigued.  And so I went to the paper:  PLOS Genetics: A Novel Human-Infection-Derived Bacterium Provides Insights into the Evolutionary Origins of Mutualistic Insect–Bacterial Symbioses.  And it is loaded with interesting tidbits.  First, the first section of the results details the history of the infection in a 71 year old male and his recovery and the isolation and characterization of a new bacterial strain.  Phylogenetic analysis revealed this was a close relative of the Sodalis endosymbionts of insects.

And then comparative genomics revealed a bit more detail about the history of this strain, it’s relatives, and some of the insect endosymbionts.  And plus, it allowed the authors to make some jazzy figures such as

And this and other comparative analyses revealed some interesting findings.  As summarize by the authors

Our results indicate that ancestral relatives of strain HS have served as progenitors for the independent descent of Sodalis-allied endosymbionts found in several insect hosts. Comparative analyses indicate that the gene inventories of the insect endosymbionts were independently derived from a common ancestral template through a combination of irreversible degenerative changes. Our results provide compelling support for the notion that mutualists evolve from pathogenic progenitors. They also elucidate the role of degenerative evolutionary processes in shaping the gene inventories of symbiotic bacteria at a very early stage in these mutualistic associations.

The paper is definitely worth a look.