Metagenomics 2006

Just got back from the “First International Conference on Metagenomics” which was held in San Diego. Despite that this is clearly NOT the first international conference on metagenomics it was not bad.

For those who do not know, metagenomics is the term used when people do DNA sequencing directly from environmental samples without isolating organisms in the first place. This term was coined by Jo Handelsman et al. in an article in 1998, where they referred to all the DNA and its coding potential in soil as the soil “metagenome.”

The meeting was hosted by UCSD/CalIT2 which are trying to move into the metagenomics field in a large part due to the large grant they have from the Moore foundation to build a metagenomics database with the Venter Institute. The database is called CAMERA and it is planning to have its first release shortly.

To be honest, even though I am involved in CAMERA, the UCSD/CAMERA folks would be better off not trying to make it seem like they are the only people organizing meetings in this area. Nevertheless, the meeting was pretty good.

There were talks by people focusing on different aspects of metagenomics, including data collection, databasing, and data analysis as well as some interesting biology. My favorites were one by Jeff Gordon, from Wash. University in St. Louis. He is doing some of the most spectacular stuff in studies of the human microbiome and he discussed a few of the studies from his group. Most importantly, he emphasized the use of germ free animals as a model system. Basically, they raise animals in completely sterile conditions and have produced mice and fish and other species that have no microbes associated with them. This allows them to do experimental manipulations to ask controlled questions about host microbe interactions. My other favorite talk was by Ford Doolittle, who even though I disagreed with some of the things he said, he always challenges the audience to rethink their assumptions. In this case, he talked about the species concept in microbes and why he thinks it does not have much us.

Overall, I got the feeling that people were being a little too worried about the difficulties in metagenomics. Yes, analyzing sequence data from environmental samples is complicated. Yes, all the bioinformatics is harder because you are dealing with a mixed sample of DNA fragments and you do not know which fragment comes from which organism in the sample. And yes, the databasing and data analysis can be very complicated because the amount of raw data and metadata can be huge. But in the end, metagenomics has the potential to be an incredibly powerful tool in studies of microorganisms in nature. And the fact that it is somewhat harder than standard genome sequencing does not mean that we are not already learning a lot from it. What we need to keep in mind is that it is simply a tool – and to try and turn it into a field (which is what it seemed like some of the players would like) is a mistake.

If you are interested in the meeting itself, the talks and discussion sessions are available here.

Genomics Education highlighted at 14th Annual International Meeting on Microbial Genomics

Just got back from the 14th Annual International Meeting on Microbial Genomics, where I gave talk on microbial symbiont genomics. This was one of the best meetings I have been to in a while. It had the right combination of everything including:

  1. Many excellent talks and posters (OK, in the interest of not upsetting people for not saying their talk or poster was great, I will not make a big list of all the ones I thought were good, but I will give a few highlights below).
  2. Excellent location (UCLAs Lake Arrowhead Conference Center, which is in the mountains east of Los Angeles). This is a place that is very conducive to getting to know colleagues and it almost forces interaction among people. There is one central building where there is a dining hall, a nice deck if you want to eat outside, the conference room, rooms for posters, and a large living room for hanging out. The rooms for sleeping are mostly great (e.g., mine was a split level condo like structure with a living room and a bedroom/bath on floor one and a bedroom/bath on floor 2). And being in the mountains is very pleasant. Plus there is a pool, jacuzzi, and sports facilities that are very nice. The only annoying thing is that the Lake itself, which is 100 yards away, but it really almost private, with most of the shoreline occupied by houses and private docks.
  3. Good food. The food is not spectacular or anything but better than the food at 90% of the conferences I have been at.

In terms of talks, there were quite of few that were both interesting topics and very well presented. For example, Jessica Green from U. C. Merced gave a great talk about spatial distributions of microorganisms, Julian Parkhill from the Sanger Center put together a really nice story about mechanisms by which microbial pathogens generate phenotypic diversity, and Julie Huber from MBL impressed many with her talk about the “Deep Rare Biosphere.”

But to me, the best two talks were ones on science education reform by two people from UCLA. Erin Sanders-Lorenz presented a summary of her course she has been teaching at UCLA that has students doing “phylogenomic” analysis which takes them from isolating and culturing organisms from environmental samples to building evolutionary trees of genes isolated from these cultured species.. This seemed like a very creative, hand on, novel way to teach students the excitement of science and some things about evolution. It sounded so well thought out that I asked for (and got) a copy of her lab manual.

Much as I liked this class, the one described by Cheryl Kerfeld knocked my socks off. She described a program they have developed at UCLA called the Undergraduate Genomics Research Initiative. This is an interdepartmental multi-course collaboration with the central theme involving the sequencing and analysis of the genome of a bacterium called Ammonifex degensii. The various courses are organized around a central course on genome sequencing. The linked courses include ones in many different departments at UCLA as well as various courses at other universities. They have clearly given enormous thought to how to do a truly project based course which likely will catch students attention and interest much more than standard lectures or standard labs.

There have been other successful hands on genome sequencing courses before – perhaps the first being one by Brad Goodner at Hiram College who had students participate in the sequencing and analysis of the genome of Agrobacterium tumefaciens (e.g., see a press release here). The Kerfeld UCLA UGRI program sounds like it has gone to the next level by integrating many courses across departments and by having creative ways to encourage participation of students in multiple aspects of the project. It really is worth giving a look at the UCLA UGRI program’s web site.

Other tidbits about the meeting:

  • Jeffrey H. Miller from UCLA organized it
  • This is the same Jeffrey Miller who identified most of the mutator genes in E. coli with a really creative genetic screen
  • There was another Jeffrey Miller from UCLA at the meeting (will leave this up to google for people to figure out who this other Miller is).

Top10 Novel ways to contribute to the Open Access movement

I am pleased to hear from more and more colleagues about how they support the Open Access movement in scientific publishing. Open Access journals are getting stronger and stronger and the tide is clearly turning towards Open Access. However, there are still many things that need to be achieved in order for Open Access to really become the rule. For example, of the colleagues who seem somewhat supportive of Open Access, but who still publish in non Open Access journals, the most common excuse is “I really need this for my resume” or something like that. What they mean is, the non Open Access journal they are trying to publish in is better known to their colleagues (and tenure review committees and job search committees) than a similar Open Access journal. In other words, they support Open Access in their heart, but are worried about the consequences for their careers.

I appreciate the concern of people worried about their jobs or promotions. Therefore, I think it is necessary for supporters of Open Access to turn up the heat even more and try and set up an environment where people to not have to make this choice. How can we do this? Well, I thought I had some good ideas about this but then saw Peter Suber’s excellent web site about this here so I will avoid trying to be comprehensive.

Instead, I have made my personal top 10 list of ways to support Open Access that can make your life better and easier too. In italics are things you can do to show you REALLY support Open Access:

  • 1. Review.
    • Do not review for non Open Access journals. Ever. Not only will this save you time, it will ratchet up the cost of business for non Open journals.
    • You can be really insidious about this and not even answer requests for review and gum up their works that way. This is best reserved for Elsevier journals.
  • 2. First timers.
    • Encourage colleagues who are Open Access virgins to submit some (or better yet, all) their papers to Open Access journals. Some will love it and never go back.
  • 3. Promote.
    • For papers you publish in Open Access journals, if you put out a press release, make the open nature a part of the release (e.g., see our release for the Tetrahymena genome paper).
    • Send the press release to your program officer.
  • 4. Legislate.
    • Write to your legislators and librarians and university officials expressing support for Open Access.
    • If you want to be extra supportive, write to local lobbying groups such as medical support groups and tax reduction advocates pointing out the follies of non Open Access.
  • 5. Promote II.
    • Find a good Open Access publication and promote it in some way – by writing about it in a blog or reviewing it for things like Faculty of 1000, submit reviews there only for Open Access articles.
    • To be a true supporter, ONLY write reviews and commentaries about Open Access publications. Pretend like others do not exist.
  • 6. Public.
    • Promote Open Access publications (e.g., your own) to the public. Since the public cannot get access to most non Open Access publications, it is hard to use them to get the public interested in science. But it works well with Open Access publications.
  • 7. Fair use.
    • Take material from Open Access publications and (if allowed) use it to make “Open” educational materials, such as review papers or powerpoint presentations. People should be able to use it (e.g., for teaching) without worrying about copyright issues. Just make sure to cite them correctly.
  • 8. Citations.
    • For citations, when all else is equal, choose to cite Open Access publications. Not only will this increase their Impact Factor, readers will be grateful because they will be able to obtain the papers more easily.
    • Note – I am not advocating not citing others, but just when you have to choose, to choose well.
  • 9. Collaborate.
    • Choose collaborators who support Open Access principles.
    • If you want to really be good, only enter a collaboration is your collaborator is willing to publish the shared findings in Open Access journals.
    • Do not collaborate with those not willing to make such an agreement.
  • 10. Data
    • Find a way to make all your data sets and supplementary material Openly available, regardless of where you publish.
    • My favorite twist on this -a viral license to use your data. If someone wants to make use of unpublished data you have, only share it if they are willing to publish results in an Open Access journal. I am sure some people will say this is against the spirit of Open Access, but it is not. It is simply taking a longer term view of the movement.

Bike Friendly Davis could be Friendlier

Davis is championed as one of, if not the, best biking cities in the US. See for example:

From my experience it certainly deserves this reputation. I live on one side of town and I work on the other side and bike to work whenever possible. I have tried to take as many different routes as I can to get to know the city. Over most of these routes, there are all sorts of bike-friendly features, like bike lanes, and traffic lights just for bikes, and even off road bike paths.

The off road bike paths are by far and away the best feature of Davis in terms of biking. These wind their way through many many communities and parks and generally make it incredibly pleasant, and safe to bike. I see so many kids on these routes going to and from school and it must be nice to know your kid can biek around possibly without ever crossing a road.

Yes despite this I am struck by the unevenness of the bike friendly features across town. For example, there is only one good off road route that head to UC Davis campus from the South side of town. This is the South Davis bikeway that it veyr nice and goes under I-80 and the railroad tracks. There is also a nice bike path on the West side of town (this one goes nearly all the way out to the next town in Winters). Ufortunately from the North and East sides of town, there is no direct route to campus that is off road. So in fact in these areas you see many many fewer people commuting within town on their bikes. I am sure the limitation is that it is hard to build bike paths into older communities. But if Davis wants to really become the best bike town in the country, it should try to find a way.

In addition, there are many very simple things that could be done to make biking around town and communiting to town much more pleasant. For example, there is what could be a really nice off road bike path connecting Davis and Sacramento. The problem with this is that it is incredibly exposed – both to the sun and to I-80 (it runs right next to 80 for much of its route). In some sections, judicious tree and shrub planing could greatly reduce both forms of exposure. It is unclear to me why this has not been done. But I am sure that this explains why this bike route seems to be so poorly used. Who would go out of there way to commute on their bike when they are so exposed to one of the most highly travelled freeways in the area.

I am very grateful to live in a place with such bike friendly features. But it seems that a few adjustments here and there could get even more people onto their bikes and off of the roads.

Vice Provost of U. C. Davis on the wrong side of Open Access

Well, my first incredibly disappointing moment at U. C. Davis. My brother sent me this link about a letter to Congress from some provosts and deans trying to go backwards on the issue of Open Access to scientific publications.

See the press release here.

And one of the signatories is the Vice Provost for academic affairs at Davis, Barbara Horwitz. Their letter contains many misleading statements in my opinion and seems to be overly biased towards the anti Open Access side of the debate. First, they say

In fact, some studies have already shown that research intensive universities would have to pay considerably more to gain access to the same amount of research under an author- pays model than a subscription model.

Where is the citation for this? This is counter to intuition and on its face seems ridiculous to me. It requires some backing up with evidence, especially in a letter to congress.

They also claim:

The free posting of unedited author manuscripts by government agencies threatens the integrity of the scientific record, potentially undermines the publisher peer review process, and is not a smart use of funds that could be better used for research.

How on earth does posting of unedited manuscripts threaten the integrity of the scientific record. That is like saying scientists should not give talks on anything until they have published it, and then they should only quote from their published papers. Or, maybe scientists should not even discuss their work at all in public and should just present it through papers published in journals. I am astonished that a Officer of my University would make such a statement.

Perhaps most amazingly, this collection of academic folks says:

As a member of the Senate Budget Committee, you are certainly sensitive to the various forces that shape and reshape the Federal budget from year to year. Recently, for example, we learned that the Biomolecular Interaction Network Database–the world’s largest free repository for proteomic data–lost its funding and curtailed its curation efforts.

This too appears to be almost absurd and certainly misleading. BIND is in the true tradition of Open Access – a database of proteomic information for the world to share. And these provosts and deans are trying to use its loss of funding as an argument for LESS OPEN ACCESS. How completely nonsensical is that? But even more incomprehensible, BIND is a CANADIAN database effort, supported by Genome Canada funding. So how this relates to the funding by the US Congress is beyond me.

This collection of provosts and deans appear to be trying to do a slight of hand here with the details. I would be willing to wager that the driving force behind their letter is the desire to continue bringing in funds to their Societies or Universities that come from subscription based publishing. (Note it seems unlikely they are writing this letter as a statement of the official policies of their universities – certainly, I did not see any extensive discussion at Davis prior to Dr. Horwitz’s signing this letter). A little survey of the backgrounds of the letter writers is informative here. What I have found with a little googling is that many of the signatories have active leadership roles in publishing non Open Access journals. Robert R. Rich is the Editor in Chief of J. Immunology, which does not support Open Access. Kenneth L. Barker is the President of SEBM, a publisher of non open access scientific publications. Barbara A. Horwitz, was the president of APS which sponsored this press release and publishes many non Open Access journals. I am sure many of the others have some type of similar roles. It would have been nice for them to mention that in this press release.

To keep in that spirit, as I have said before, I am on the editorial board of PLoS Biology and PLoS Computational Biology and I support Open Access publishing completely. I do not always disclose this in discussions of Open Access but then again, I have never written a letter to congress making use of my position in a university to promote a position with such obvious direct benefit to myself.

Some interesting links and tidbits related to this article:

  • In their annual report from a few years ago, APS discusses how the DC Principles organization was founded specifically to counteract the Open Access movement.
  • Peter Horwitz writes about the letter more here
  • The APS we are discussing here is the American Physiological Society. Note it is NOT the same as the other APS commonly seen on science journals – the American Physical Society which is moving more to complete Open Access.

Note – thanks for T. Scott Plutchak at UAB for pointing out that it is possible to support Open Access without being a total jerk, and thus getting me to tone down some of the language from the original version of this post.

Good Open Access Biology Resources

Boring blog overall, but I wanted to put a collection of links here for information about Open Access, especially as it regards to biomedical literature. I will add more links to this over time, and welcome suggestions.

Royal Society just digs a deeper hole

The Royal Society has announced that they are making their full archive, including papers going back hundreds of years, available online for the first time. I read this line and thought – “Finally, the Royal Society is moving towards Open Access”. After all, the US National Academy of Sciences provides full and free access to all articles 6 months after publication.

Then I read the next sentence, which says that the Royal Society wil provide this free access to their archive until December.:

And until December the archive is freely available to anyone on the internet to explore. ….

After December 2006 subscribers to our subscription packages (S, A and B) will enjoy privileged online access to the archives. Private researchers will also be able to access individual articles for a small fee per download.

The Royal Society appears to simly want to hold on to every little last shred of money they can get for things published originally hundreds of years ago. They could make a great contribution to the world by opening up their archive completely. But clearly, the Royal Society is not about making contributions to humanity. What they appear to be about is a scientific oligarchy that exists mostly to promote themselves and their freinds. I would like to point out again that of 1316 fellows, 62 are women.

So this group of scientists appears to be trying to continue the bad traditions started hundreds of years ago, like excluding women from science. I looked for but could not find information on minorities but can only assume that their record in this area is even worse, as they do not discuss it on their web site.

Perhaps some day the UK public will wisen up and stop giving money to this collection of Neanderthal wannabes.

The hypocrisy of most projects with "Open" data release

There has been a growing trend in biological research, for scientists to release their data in some way or another prior to publication. This data release is meant to promote the advancement of science, and it frequently does. This is perhaps best seen with genome sequencing projects, such as the public version of the “Human Genome Project.” In many if not most cases, centers that do the bulk of the sequencing work release the sequence data for searching by others, even before publishing papers on their own data. In most cases, restrictions are placed on how the data can be used, but the data is still released for others to look at.

This is of course in contrast to how much of science works, with researchers keeping their data to themselves until they are ready to publish something. The genome centers who have made their data available prior to publication deserve some credit for this openness. Especially since the data release in general by genome centers has been so far and beyond what biology researchers do. In fact, many of these centers go out of their way to promote getting such credit (they even got Clinton and Blair to play along) The best example of this was the public human genome project, which made multiple claims about how great they were for humanity for releasing the data “within 24 hours of gathering it.” This data release policy was captured in something that became known as the Bermuda Principles, due to a meeting that took place in Bermuda (see a nice summary of this by John Sulston here).

What is appalling to me, however, is that these same centers that try to take credit for their openness, then turn around and usually publish their papers in non Open Access journals (for those who do not know, this means that then one has to pay money, frequently enormous sums of money, just to read the paper). I do not understand this. A paper about an analysis someone did on a data set may in fact be more valuable to the community than the data itself. If the genome centers like TIGR, JGI, Sanger, Whitehead, etc. really wanted to be on the side of openness, they should stop publishing their papers in non Open Access journals. Unfortunately these places publish very few of their papers in such journals.

For example, the Joint Genome Institute (JGI) which I am now affiliated with, is continually showing two faces on this issue. On the one hand, the issue press release after press release regarding their release of data on various genome projects (e.g., here). That is fine, although a little over the top sometimes. But then they almost never publish any of their work in Open Access journals (e.g., see their latest press release on a paper published about a genome in Science, a non Open Access journal). Any taxpayers out there should be disappointed with this as the genome centers get TONS of money to carry out this work for the public benefit. And then for the papers on the work to be hidden behind huge subscription fees is a waste of your money.

This is particuarly surprising coming from JGI since JGI is run directly by the Department of Energy (unlike most other centers which are either private or part of a university). Thus apparently DOE does not want to follow even the recommendations of congress and the senate regarding Open Access to publications. Nor does DOE apparently want to do the right thing by requiring their institutes for publish in Open Access journals. Too bad. Taxpayers hopefully will begin to get more and more upset about the waste of their money as these centers take enormous amounts of the federal science budget and convert it into documents that only a few can read.

The Blogger World Favors Open Access Publications

Well, even though the traditional press did not pick up the story about the Tetrahymena genome paper, it seems that lots of blogs and online news sources picked it up.

Here are some:

Maybe the press release from TIGR did not excite the “real” press too much, I do not know. But nevertheless, it is good to see people discussing the article and even better to see that the article is currently the #1 viewed article for the week at PLoS Biology. I asumme that most of this comes from slashdot running an item about the article but I am not 100% sure.

I think the blogger world seems to run stories about Open Access publications much more than
about non Open Access publications since they can read them freely. It would seem that the blogger world is helping to promote Open Access papers and may explain why in the recent past I have gotten much more response to Open Access papers than even to papers in Nature or Science.

It is so important for scientific research to reach all people, not just scientists who can afford subscriptions to journals. Thus a partnership between bloggers and open access publications seems perfect for the new way of doing science.

Tackling the hairy beast – Tetrahymena genome

ResearchBlogging.org

Just thought I would put out a little self-promotional posting here on a paper we have published today on the genome of a very interesting organism called Tetrahymena thermophila. This organism is a single-celled eukaryote that lives in fresh water ponds.

This species has served as a powerful model organism for studies of the workings of eukaryotic cells. Studies of this species have led to some fundamental discoveries about how life works. For example, telomerase, the enzyme that helps keep the ends of linear chromsomes from degrading, was discovered in this species. This may not seem too important, but many folks think that degradation of chromosome ends in humans is involved in aging. Perhaps even more importantly, (to me at least) studies of this species were fundamental to the discovery that RNA can be an enzyme. This discovery of catalytic RNA revolutionized our understanding of how cells work and how life evolved. Tom Cech and Sidney Altman were given the Nobel Prize in 1989 for this discovery.

Many (including myself) believe that having the genome sequence of this species will further spur research and its use as a model organism. In addition, we believe that some of the findings we report in our paper will further cement the importace of this species. For example, this species, though single celed, encodes nearly as many proteins as humans and possesses many processes and pathways shared with animals but missing from other model single celled species.

The project that led to this publication was undertaken while I was at TIGR (The Institute for Genomic Research) and involved a collaboration among people at dozens of research institutions around the world. It all started in 2001 when Ed Orias and his colleagues sought to see if anyone at TIGR would be interested in putting in a grant to sequence this species’ genome. I responded to the email saying I was interested, especially since I had interacted with multiple people who used this species as a model system (e.g., Laura Landweber at Princeton and Laura Katz at Smith). So I went to a FASEB meeting where the Tetrahymena Genome Steering Committee was meeting and discussed with them how TIGR might help sequence the genome. And after talking to other genome centers, they selected TIGR to put in a grant proposal with them.

We ended up getting funding from two grant proposals – one from NIGMS and the other from the NSF Microbial Genome Sequencing Program. The sequencing was done in a rapid burst at the new Joint Technology Center which TIGR shares with the Venter Institute. And then we spent ~1.5 years analyzing the sequence data (and assemblies) that came out and in the end we fortunately were able to get our paper into PLoS Biology, in my opinion the best place available to publish biology research.

Importantly PLoS Biology is Open Access which allows anyone anywhere to read about our work. This goes well with the free and open release we made of the genome sequence data. In fact, many people published papers on the genome before we did (sometimes scooping us). In the end, I accepted the risks of releasing the genome data with no restrictions inexchange for advancing research on this organisms. I think this risk was well worth it as we still got our big paper published and the field has advanced more rapidly than if we had not released the data.

Other links that may be of interest to people:

Eisen, J., Coyne, R., Wu, M., Wu, D., Thiagarajan, M., Wortman, J., Badger, J., Ren, Q., Amedeo, P., Jones, K., Tallon, L., Delcher, A., Salzberg, S., Silva, J., Haas, B., Majoros, W., Farzad, M., Carlton, J., Smith, R., Garg, J., Pearlman, R., Karrer, K., Sun, L., Manning, G., Elde, N., Turkewitz, A., Asai, D., Wilkes, D., Wang, Y., Cai, H., Collins, K., Stewart, B., Lee, S., Wilamowska, K., Weinberg, Z., Ruzzo, W., Wloga, D., Gaertig, J., Frankel, J., Tsao, C., Gorovsky, M., Keeling, P., Waller, R., Patron, N., Cherry, J., Stover, N., Krieger, C., del Toro, C., Ryder, H., Williamson, S., Barbeau, R., Hamilton, E., & Orias, E. (2006). Macronuclear Genome Sequence of the Ciliate Tetrahymena thermophila, a Model Eukaryote PLoS Biology, 4 (9) DOI: 10.1371/journal.pbio.0040286