The hypocrisy of most projects with "Open" data release

There has been a growing trend in biological research, for scientists to release their data in some way or another prior to publication. This data release is meant to promote the advancement of science, and it frequently does. This is perhaps best seen with genome sequencing projects, such as the public version of the “Human Genome Project.” In many if not most cases, centers that do the bulk of the sequencing work release the sequence data for searching by others, even before publishing papers on their own data. In most cases, restrictions are placed on how the data can be used, but the data is still released for others to look at.

This is of course in contrast to how much of science works, with researchers keeping their data to themselves until they are ready to publish something. The genome centers who have made their data available prior to publication deserve some credit for this openness. Especially since the data release in general by genome centers has been so far and beyond what biology researchers do. In fact, many of these centers go out of their way to promote getting such credit (they even got Clinton and Blair to play along) The best example of this was the public human genome project, which made multiple claims about how great they were for humanity for releasing the data “within 24 hours of gathering it.” This data release policy was captured in something that became known as the Bermuda Principles, due to a meeting that took place in Bermuda (see a nice summary of this by John Sulston here).

What is appalling to me, however, is that these same centers that try to take credit for their openness, then turn around and usually publish their papers in non Open Access journals (for those who do not know, this means that then one has to pay money, frequently enormous sums of money, just to read the paper). I do not understand this. A paper about an analysis someone did on a data set may in fact be more valuable to the community than the data itself. If the genome centers like TIGR, JGI, Sanger, Whitehead, etc. really wanted to be on the side of openness, they should stop publishing their papers in non Open Access journals. Unfortunately these places publish very few of their papers in such journals.

For example, the Joint Genome Institute (JGI) which I am now affiliated with, is continually showing two faces on this issue. On the one hand, the issue press release after press release regarding their release of data on various genome projects (e.g., here). That is fine, although a little over the top sometimes. But then they almost never publish any of their work in Open Access journals (e.g., see their latest press release on a paper published about a genome in Science, a non Open Access journal). Any taxpayers out there should be disappointed with this as the genome centers get TONS of money to carry out this work for the public benefit. And then for the papers on the work to be hidden behind huge subscription fees is a waste of your money.

This is particuarly surprising coming from JGI since JGI is run directly by the Department of Energy (unlike most other centers which are either private or part of a university). Thus apparently DOE does not want to follow even the recommendations of congress and the senate regarding Open Access to publications. Nor does DOE apparently want to do the right thing by requiring their institutes for publish in Open Access journals. Too bad. Taxpayers hopefully will begin to get more and more upset about the waste of their money as these centers take enormous amounts of the federal science budget and convert it into documents that only a few can read.

The Blogger World Favors Open Access Publications

Well, even though the traditional press did not pick up the story about the Tetrahymena genome paper, it seems that lots of blogs and online news sources picked it up.

Here are some:

Maybe the press release from TIGR did not excite the “real” press too much, I do not know. But nevertheless, it is good to see people discussing the article and even better to see that the article is currently the #1 viewed article for the week at PLoS Biology. I asumme that most of this comes from slashdot running an item about the article but I am not 100% sure.

I think the blogger world seems to run stories about Open Access publications much more than
about non Open Access publications since they can read them freely. It would seem that the blogger world is helping to promote Open Access papers and may explain why in the recent past I have gotten much more response to Open Access papers than even to papers in Nature or Science.

It is so important for scientific research to reach all people, not just scientists who can afford subscriptions to journals. Thus a partnership between bloggers and open access publications seems perfect for the new way of doing science.

Tackling the hairy beast – Tetrahymena genome

Just thought I would put out a little self-promotional posting here on a paper we have published today on the genome of a very interesting organism called Tetrahymena thermophila. This organism is a single-celled eukaryote that lives in fresh water ponds.

This species has served as a powerful model organism for studies of the workings of eukaryotic cells. Studies of this species have led to some fundamental discoveries about how life works. For example, telomerase, the enzyme that helps keep the ends of linear chromsomes from degrading, was discovered in this species. This may not seem too important, but many folks think that degradation of chromosome ends in humans is involved in aging. Perhaps even more importantly, (to me at least) studies of this species were fundamental to the discovery that RNA can be an enzyme. This discovery of catalytic RNA revolutionized our understanding of how cells work and how life evolved. Tom Cech and Sidney Altman were given the Nobel Prize in 1989 for this discovery.

Many (including myself) believe that having the genome sequence of this species will further spur research and its use as a model organism. In addition, we believe that some of the findings we report in our paper will further cement the importace of this species. For example, this species, though single celed, encodes nearly as many proteins as humans and possesses many processes and pathways shared with animals but missing from other model single celled species.

The project that led to this publication was undertaken while I was at TIGR (The Institute for Genomic Research) and involved a collaboration among people at dozens of research institutions around the world. It all started in 2001 when Ed Orias and his colleagues sought to see if anyone at TIGR would be interested in putting in a grant to sequence this species’ genome. I responded to the email saying I was interested, especially since I had interacted with multiple people who used this species as a model system (e.g., Laura Landweber at Princeton and Laura Katz at Smith). So I went to a FASEB meeting where the Tetrahymena Genome Steering Committee was meeting and discussed with them how TIGR might help sequence the genome. And after talking to other genome centers, they selected TIGR to put in a grant proposal with them.

We ended up getting funding from two grant proposals – one from NIGMS and the other from the NSF Microbial Genome Sequencing Program. The sequencing was done in a rapid burst at the new Joint Technology Center which TIGR shares with the Venter Institute. And then we spent ~1.5 years analyzing the sequence data (and assemblies) that came out and in the end we fortunately were able to get our paper into PLoS Biology, in my opinion the best place available to publish biology research.

Importantly PLoS Biology is Open Access which allows anyone anywhere to read about our work. This goes well with the free and open release we made of the genome sequence data. In fact, many people published papers on the genome before we did (sometimes scooping us). In the end, I accepted the risks of releasing the genome data with no restrictions inexchange for advancing research on this organisms. I think this risk was well worth it as we still got our big paper published and the field has advanced more rapidly than if we had not released the data.

Other links that may be of interest to people:

Eisen, J., Coyne, R., Wu, M., Wu, D., Thiagarajan, M., Wortman, J., Badger, J., Ren, Q., Amedeo, P., Jones, K., Tallon, L., Delcher, A., Salzberg, S., Silva, J., Haas, B., Majoros, W., Farzad, M., Carlton, J., Smith, R., Garg, J., Pearlman, R., Karrer, K., Sun, L., Manning, G., Elde, N., Turkewitz, A., Asai, D., Wilkes, D., Wang, Y., Cai, H., Collins, K., Stewart, B., Lee, S., Wilamowska, K., Weinberg, Z., Ruzzo, W., Wloga, D., Gaertig, J., Frankel, J., Tsao, C., Gorovsky, M., Keeling, P., Waller, R., Patron, N., Cherry, J., Stover, N., Krieger, C., del Toro, C., Ryder, H., Williamson, S., Barbeau, R., Hamilton, E., & Orias, E. (2006). Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote PLoS Biology, 4 (9) DOI: 10.1371/journal.pbio.0040286