Lessons Learned at the JGI Users Meeting

Well, the Joint Genome Institute (JGI) Users Meeting is Over. For some rapid fire notes on the meeting see the FriendFeed room here. where Jason Stajich, Tom Sharpton and I (and occasionally a few others) took notes on the whole thing as it was happening.

So – what were the lessons learned? What were the main points? So – if you want to know what we were writing about – and don’t want to read the notes (they are quite fun actually) how about a word cloud? Well, here is one, which I made by taking the notes, editing out some of the names and other non notes text, and then pasting them into TagCrowd.Com and I got this to the left here:

Not actually a bad representation of many of the topics.  
But you know, this does not per se capture the big points, just the common points.  So I guess if you want the key points, you have to think a bit. First, you have to realize this is a JGI focused meeting (which of course is the whole point of the meeting – it is the JGI User Meeting after all) and most of the people work with JGI in some way so it is a bit hard to see the forrest of genomics for the JGI trees.    And here are my top lessons I got out of the meeting after trying to not get too biased by the JGI focus (note – I have an Adjunct Appt. at JGI and do lots of things through there so I am sure I cannot remove the JGI focus completely)
  1. NextGen sequencing continues to open up new windows into biology.
  2. Ecological and population genomics are truly the next big thing.
  3. Related to the above point, one of the next revolutions is going to be in high throughput phenotyping — after all, we cannot solve the genotype-phenotype problem when we only know the genotype.
  4. Model / reference organisms are still in, but every single organism on the planet is now in play.
  5. NextGen sequencing has completely outrun the ability of even good bioinformatics people to keep up with the data and to use it well.
  6. Related to the above point, the NextNextGen (e.g., Pacific Biosciences) seems to be barreling along and almost ready for prime time.  WTF are we going to do in terms of informatics then?
  7. Following up on the above point- we desperately need a MASSIVE effort in the development of tools for “normal” biologists to make better use of massive sequence databases.
  8. I am happy to report that just about everyone seems to be trying to use an evolutionary perspective as part of their work – especially in the selection of organisms for sequencing
  9. I am sorry to report that many of the evolutionary “perspectives” are a bit off kilter.
  10. Sequencing is definitely not over – it is just getting started.
  11. People pushing the technology (e.g., George Church, Craig Venter) into new arenas definitely inspire the crowds.
  12. If you study a plant or an animal and are not studying the microbes that live with them, you are missing something.
  13. If you study ANY organism and are ignoring epigenetics you are behind the curve
  14. Open Access journals like PLoS Biology and PLoS One and Open Science got some huge props at this meeting, for example, with Venter showing many PLoS related images, many others showing stuff from OA journals, George Church talking about Open Source sequencers and Open personal genomics, and many referring to Open genomics databases.  Still some areas in need of improvement (e.g., not enough publishing in open access journals still) but the move in the direction of openness is great.
  15. Genome Centers definitely each have their own flavors with JGI positioning itself well in the niches of Ecology, Evolution, and Energy.
  16. Genome Centers are definitely going to have to reinvent themselves as the sequencing capacity for individual labs goes up and up with Illumina/Roche/ABI Solid continuing to spread.  Bigger, better, faster, more is one way they can stay ahead of the curve.
  17. Education and training did not get as much play during the talks as I would have hoped.  I mentioned it a bit but I do not recall too many other mentions.  Too bad as the real potential for the democritization of sequencing comes from people getting trained in how to generate and handle the data and how to at least think about it even if they do not use it directly.
  18. Organismal biology is still desperately important in all of this work – if you know a lot about the organism as a whole then you already are a systems biologist.
  19. Genomic characterization of entire multi-organism systems is on the rise and this is not just microbiota stuff but also things like host-pathogen interactions and symbioses and so on.
  20. Reading DNA is being used in every which way imaginable.  Next up – writing DNA.  

Acid Rock Bacteria Genome …

Just a little plug for a new paper of which I am a co-author. This is a report on the analysis of the genome sequence of Acidithiobacillus ferrooxidans which was just published in BMC Genomics (an open access journal, by the way). This paper was a long long time coming – the genome was sequenced when I was at TIGR many years ago (Herve Tettelin coordinated most of the work). Since I was interested in the biology of this bug I volunteered to help turn the sequence into a paper, but was pretty lame about doing that. Thankfully David Holmes and Jorge Valdes in Chile helped make a paper from the data and much additional analyses. Here is the abstract:

Background
Acidithiobacillus ferrooxidans is a major participant in consortia of microorganisms used for the industrial recovery of copper (bioleaching or biomining). It is a chemolithoautrophic, γ-proteobacterium using energy from the oxidation of iron- and sulfur-containing minerals for growth. It thrives at extremely low pH (pH 1–2) and fixes both carbon and nitrogen from the atmosphere. It solubilizes copper and other metals from rocks and plays an important role in nutrient and metal biogeochemical cycling in acid environments. The lack of a well-developed system for genetic manipulation has prevented thorough exploration of its physiology. Also, confusion has been caused by prior metabolic models constructed based upon the examination of multiple, and sometimes distantly related, strains of the microorganism.

Results
The genome of the type strain A. ferrooxidans ATCC 23270 was sequenced and annotated to identify general features and provide a framework for in silico metabolic reconstruction. Earlier models of iron and sulfur oxidation, biofilm formation, quorum sensing, inorganic ion uptake, and amino acid metabolism are confirmed and extended. Initial models are presented for central carbon metabolism, anaerobic metabolism (including sulfur reduction, hydrogen metabolism and nitrogen fixation), stress responses, DNA repair, and metal and toxic compound fluxes.

Conclusion
Bioinformatics analysis provides a valuable platform for gene discovery and functional prediction that helps explain the activity of A. ferrooxidans in industrial bioleaching and its role as a primary producer in acidic environments. An analysis of the genome of the type strain provides a coherent view of its gene content and metabolic potential.

Open genetics: genome rearrangement videos and more

A little late I know, but I was going through my draft postings and I rediscovered this one from July. There is an interesting paper in PLoS Genetics by Aaron Darling et al (full disclosure — Aaron is now working in my lab as a Post Doc … though I started writing this before I realized the paper was his). The paper is about genome rearrangement in bacterial populations (see Dynamics of Genome Rearrangement in Bacterial Populations). Though the science in the paper is quite interesting, the part I want to promote here are the fun genome rearrangment videos in the supplemental material.
//www.youtube.com/get_player

The figure and video are from Darling AE, Miklós I, Ragan MA (2008) Dynamics of Genome Rearrangement in Bacterial Populations. PLoS Genet 4(7): e1000128. doi:10.1371/journal.pgen.1000128.

Cool Plant Comparative Genomics Resource: Phytozome

I spent the last few days at a “retreat” for the Joint Genome Institute and heard about a few things there worth sharing with everyone. I will try and post about some of them in the next few days. Here is one. The JGI and the Center for Integrative Genomics have made a pretty cool tool for comparative analyses of plant genomes. It is called Phytozome and has a variety of simple and nice features. JGI is doing more and more work on plant genomes as part of their energy research and I think Phytozome could turn into a good place to go to get the latest plant genome information. Go to http://www.phytozome.net to see the real thing.

Wanted -Microbial Genomics Lead at JGI

The Joint Genome Institute, where I work part of the time, is seeking a lead scientist for their Microbial Genomics work.

Sr. Research and Management Opportunity

The DOE Joint Genome Institute (JGI) in Walnut Creek, CA has an exciting Staff Scientist opportunity available. Will be responsible for leading the JGI’s Microbial Genome Program including the development of an independent research program in microbial genomics. Will manage all aspects of the program from application review through sequencing and genome analysis. Will be expected to collaborate with external scientific communities, present scientific data and publish results independently and with collaborators. Will also participate as a member of the JGI senior management team. This position reports to the Deputy Director of Scientific Programs.

For more information see here. If you want to play a leadership role in microbial genomics, this job is for you.

Combinging two of my favorite things – chocolate and genomes

Well, the Mars company has really done it now (see Unwrapping the Chocolate Genome -from washingtonpost.com). They are planning to sequence the cacoa genome. Genomes and chocolate. Man are they going to get every bioinformatics person I know to apply to help out with this project …

Some little notes on the project:

  • They plan to release the data for public use: “Mars plans to make the research results free and accessible through the Public Intellectual Property Resource for Agriculture, a group that supports agricultural innovation, as they become available. The intent is to prevent opportunists from patenting the plant’s key genes.”
  • They are doing this in a collaboration with IBM
  • Good quote by Howard Shapiro: “We have the ability as a private company to take charge of the future,” Howard-Yana Shapiro, global director of plant science for Mars, said.”

So -even though I pondered whether this was science by press release, a friend of mine convinced me it was not and this was just getting out the word on the project. For other details see

Connection between Video Games and Bioinformatics?

The Scientist Magazine has a nice piece on one of my favorite people in all of Science – Sean Eddy. In the article, they discuss how Sean is one of those bioinformatics folks who does not just hack together some code to do something but actually writes really good code for his programs. For those of you who do not know, Sean has made a whole collection of software tools for biologists (see his web site here). Perhaps the most widely used is HMMER, which is designed for making and using hidden markov models. But there are some other good ones he has put out. My favorite is Forester, which was made by Christian Zmasek in his lab and is supposed to be available here, although the site is not working right now (NOTE – Christian has posted a new link for it in the comments). I like this because, well, it is software for “phylogenomic” analysis.

Anyway – it is a nice article about Sean, especially the parts talking about how his background in video games contributed to his success in bioinformatics. Back to something I said above, Sean is without a doubt one of my favorite people in science. There are many reasons for this but here are a few.

  • He is very open with ideas.

    Once, at a conference, I gave a talk on this bizarre new pattern we had found when we were comparing the genomes of E. coli and V. cholerae. We had found that when we did genome-level alignments of these species there was an X-like pattern (see our paper on this here). Anyway, in the talk I said something to the effect of “we have no friggin idea how these X-like alignments could be generated” And Sean, I think in the quesiton session, pointed out that in another paper of ours we had seen what appeared to be symmetric inversions occurring around the origin of replication and that could create the X-alignment. And lo and behold he was right. We got the paper, but in a large part it was his push that got us looking at the inversions sooner than we would have.

  • He is very open with science.

    Most of Sean’s work is on the open side of science. Open Source software. Open Access publications. Open everything. And I should point out that it was a talk by Sean that catalyzed my conversion into an Open Science supporter. I was attending a meeting in Ft. Lauderdale to discuss data release policies for genome projects. This meeting led to the “Ft Lauderdale Agreement” on data release, by the way. A the meeting there were many genomics players like Eric Lander and Francis Collins who were trying to push for not completely open data release policies where genome centers could release data but there would be constraints placed on the use of the data so that the genome centers would be the first to be able to publish genome scale analysis of an organisms genome sequence.

    At the time I was working at TIGR and I supported this notion of basically letting people search for a few genes of interest but preventing them from doing genome analyses. And then Sean got up and gave a talk and, well, blew my mind. I am sure I have notes somewhere from the meeting but basically what he said was – the genome projects whole point is to generate genome data for people to do genome-level analysis. So how on earth can we justify preventing exactly the type of analysis that the projects were designed to generate. He was not saying that we should not somehow protect the genome centers. What he was saying was that for the benefit of science, we need to find a way to allow people to do genome-level analyses immediately on the data. And he also said that the risks of releasing ones data with no restrictions are much less than everyone claims. I think he convinced many people that genome centers needed to open up their data release policies a bit more. And he convinced me.

    And so I went home from that meeting and decided to release the data from as many of my genome projects as I could, with NO restrictions (e.g., this is what we did with Tetrahymena). And also, this new found belief in openness helped pave the way for my conversion to being an Open Access publishing supporter.

Anyway, glad to see Sean getting positive press. It is well deserved. Now off to play some video games.

Top 10 Things Francis Collins Might Do After NHGRI

As I have said, I think Francis Collins, upon leaving the NHGRI, is well set up to become an advisor to some presidential candidate (not that I would pick him as my advisor if I were running, but alas I am not running this time around).
But if he does not become Obama’s science sidekick or McCain’s genomics guru, well, there are lots of things he might do.  Here are some possibilities:
  • 1. The 1000 genome songs project.  He is already further along in this than the 1000 genomes project …
  • 2. Get a job at Craig Venter’s Synthetic Genomics.  Hey, Collins says he wants to try something new.  And Craig has a history of hiring people who used to work at funding agencies.
  • 3. Sequence Jesus’ genome.  More on this later.
  • 4. Run the World Anti Brain Doping Agency (WABDA).  We need a crusader to run the organization.
  • 5. Start a blog.  Hey, there are worse things one could do with free time.  Not many.  But there are some.  
  • 6. Start a genomic information anti-discrimination lobbying firm.  Like others in government, he really should try to make money off of legislation he helped pass.
  • 7. Dancing with the stars.  He could even sing along too.
  • 8. Start giving talks about genetic inferiority of various races and genders.  Or did someone who once ran NHGRI already do that? 
  • 9. Try and apply for some of NHGRI’s money.  Oh wait, he does not run a huge sequencing center, so he may not qualify.
  • 10. Make jewelry out of disk shaped beads.  Also known as sequinsing.  

Francis Collins SteepingStepping Down from NHGRI

Just got forwarded this email from Francis Collins to multiple people. Collins is stepping down. I wonder what specifically triggered this … my guess is he is being recruited by one of the presidential candidates to be some sort of advisor. Nothing like having a prominent scientist who also is born again being on your team ….

From: FSCollins (NIH/NHGRI)
Date: Wed, May 28, 2008 at 11:58 AM
Subject: News

Dear friends and colleagues in the many wonderful team projects that I have had the privilege of being part of,

I am writing to let you know of my plans to step down August 1, 2008 as Director of the National Human Genome Research Institute, a position that has been both a joy and privilege to hold for the past 15 years.

The key to success is having wonderful scientific opportunities and stellar colleagues with whom to work. Many challenges lie ahead as genomics increasingly becomes a leading force in medicine, and I leave my position supremely confident that NHGRI and NIH will continue to achieve notable success in meeting them.

Looking back, I’m tremendously proud of our collective work in leading the Human Genome Project (HGP) to its successful conclusion in 2003, and of our wide range of large-scale projects that built upon the foundation laid by the HGP. Collectively, these projects and the priceless data they generated have transformed biomedical research and empowered researchers all around the world. I’m also proud of these projects’ commitments to protecting the privacy of genetic information and addressing the ethical, legal and social implications of genome research.

My decision to step down as NHGRI Director came only after much personal deliberation and was driven by a desire for an interval of time dedicated to writing, reflection and exploration of other professional opportunities in the public or private sectors. Rest assured that NHGRI’s leadership will be in good hands. Alan E. Guttmacher, M.D., the current deputy director of NHGRI, will become acting director of NHGRI on August 1, and Mark Guyer, Ph.D., the long-time director of the Division of Extramural Research will continue his able leadership. A formal search process for a permanent NHGRI director will get underway shortly.

Finally, I’d like to let each of you know that while I may be leaving the NHGRI Director’s office in search of other challenges, I will be cheering for the success of your dedicated and creative scientific achievements over the coming weeks, months, and years.

Keep up the good work!

Francis

Genomics Blogger Dissed by the New York Times

Well, the New York Times has an article today on Knome, a company that is charging people $350,000 to have their genome sequenced. They have two people signed up so far. Amy Harmon, the author of the Times article, interviewed me by email for her article, mostly because I had blogged extensively about the recent AGBT meeting where many of these sequencing companies had presented their latest goodies. But alas, being a genomics blogger apparently does not carry as much weight as being (1) Jim Watson (2) someone who pays 350,000 to have their genome sequenced (3) running a genome center (Richard Gibbs) or (4) starting Knome (George Church). And so my quotes got left on the proverbial cutting room floor. Go figure.

Too bad for the Times, as I think I my quotes were pretty good. Although in all honesty, they would not have gone too well with the final article, which has some nice cohesion to it.

I should note that, other than the lack of my quotes, one thing that got left out of the Times article is any discussion of the quality of the genome sequence that will be provided by Knome. One would hope that 350K would buy a high quality genome sequence but it is unclear how good it will be. I note they claim they will provide analysis of the genome too, but as with other companies that cater to the rich and famous, details are limited on their web site.

This reminds me of a funny scene I witnessed involving Craig Venter and a wealthy friend of his. This was the day before Craig’s personal genome was set to come out in PLoS Biology and the friend was asking Craig what it cost to sequence his genome. Craig said something to the effect of “man millions of dollars” and then Craig said, “but now it would cost only about 300K.” The friend was intrigued. And Craig asked if he was interested and the friend, without seeming in any way to be joking, said, “sure sign me up.” I guess, when money is abundant, why not get your genome sequenced? Maybe I should have told Harmon this story — then I might have gotten in the Times.