Eisen Lab Blog

The story behind the story of my new #PLoSOne paper on "Stalking the fourth domain of life" #metagenomics #fb

Well, here goes.

This is a post about a paper that has been a long long time coming. Today, a paper of mine is being published in PLoS One. The paper is titled “Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees” and is available at http://dx.plos.org/10.1371/journal.pone.0018011. (or if that link does not work you can get a copy here). This paper represents something I started a long time ago and I am going to try to describe the story behind the paper here.

I note – we are not doing a press release for the paper, for a few reasons. But one of them is that, well, I am starting to hate press releases. So I guess this is kind of my press release. But this will be a bit longer than most press releases. I note – my key fear here is that somehow in my communications with the press or in our text in the paper or in this post I will overstate our findings. Here is the punchline – we found some very phylogenetically novel forms of phylogenetic marker genes in metagenomic data. We do not have a conclusive explanation for the origin of these sequences. They may be from novel viruses. The They may be ancient paralogs of the marker genes. Or they may be from a new branch of cellular organisms in the tree of life, distinct from bacteria, archaea or eukaryotes. I think most likely they are from novel viruses. But we just don’t know.

UPDATE: Am posting some links here to news stories/blogs about our paper





    First – a summary of what we did.

    In the paper, we searched through metagenomic data (sequences from environmental samples) for phylogenetically novel sequences for three standard phylogenetic marker genes (ss-rRNA, recA, rpoB). We focused on sequences from the Venter Global Ocean Sampling data set because, well, we started this analysis many years ago when that was the best data set available (more on this below). What we were looking for were evolutionary lineages of these genes that were separate from the branches that corresponded to the three known “Domains” of life (bacteria, archaea and eukaryotes).

    To search for such novel lineages in the metagenomic data, we built evolutionary trees using these genes where we included sequences from known organisms (and viruses) as well as sequences from metagenomic data. We then looked through the trees for groups that were both phylogenetically novel and included only environmental data (i.e., they were new compared to known organisms or viruses). This method did not work very well for rRNA sequences (largely because making high quality alignments of short phylogenetically novel rRNA sequences was difficult – more on this below). But with RecA and RpoB homologs we were able to generate what we believe to be robust phylogenetic trees. And in these trees we found evidence for phylogenetically very novel sequences in environmental data.

    Figure 1. Phylogenetic tree of the RecA superfamily. 

    Figure 3. Phylogenetic tree of the RpoB superfamily

    We then propose and discuss four potential mechanisms that could lead to the existence of such evolutionarily novel sequences. The two we consider most likely are the following

    1. The sequences could be from novel viruses
    2. The sequences could be from a fourth major branch on the tree of life

    Unfortunately, we do not actually know what is the source of these sequences. So we cannot determine which of the theories is correct. Obviously if there is a novel lineages of cellular organisms out there, well, that would be cool. But we have no evidence right now if that is what is going on. Personally, I think it is most likely that these novel sequences are from weird viruses. But as far as we can tell, they truly could be from a fourth major branch of cellular organisms and thus even though we did not have the story completely pinned down, we decided to finally write up the paper to get other people to think about this issue.

    Below I give all sorts of other details about the project in the following areas

    • The history of the project 
    • More detail on what is in the paper 
    • Follow up analysis and rapid posting with google Know 
    • Data deposition in Dryad 
    • Who was involved 
    • UPDATE: Funding for this work



    The history of the project

    Well, this is one of those projects for which the history is hard to explain. We started this work in 2004 when I was helping Venter and colleagues analyze the Sargasso Sea metagenome data. I was working at TIGR in 2003, which are the time was a sister institute to some of the institutes affiliated with the J. Craig Venter Institute (JCVI) (it was a complicated time). Craig had led a project to do a massive amount of shotgun sequencing of DNA isolated from the Sargasso Sea, which had been the site of many previous studies of uncultured microbes. And Craig, as well as some of the people working with him including John Heidelberg who was at TIGR, had asked me to help in analysis of the data. So I eventually went to a meeting about the project and got involved. It was quite exciting and I put a lot of effort into helping analyze the data.

    As part of my work on the project, I and Martin Wu and Dongying Wu did a variety of phylogenetic studies of genes and gene families. One of these, was a phylogenetic analysis of proteorhodopsin homologs showing massively more diversity in the Sargasso data than in the PCR experiments done by Delong and Beja and others.

    Figure 7 from Venter et al. 2004. 

    We also did the first “phylotyping” in metagenomic data using genes other than rRNA. We built trees of bacterial ss-rRNAs, RecAs, RpoBs, HSP70s, EF-Tus and EF-Gs and then assigned each sequence to a phylum from the trees. In this analysis we found a variety of interesting things. 

    Figure 6 from Venter et al. 2004. 
    One thing I did not include in the Sargasso paper was an analysis I did of RecA homologs where I tried to include ALL RecA-like genes from bacteria, archaea, eukaryotes and viruses. The trees I made were a bit unusual but I was not sure that the alignments I had made were robust or that I had found all the RecA-like genes of interest so I did not even show this to Craig et al. at the time.
    UPDATE: I note – our work on this project was supported by a grant from the NSF Assembling the Tree of Life program that was awarded to me and Naomi Ward and Karen Nelson. Those funds supported the development of many of the informatics tools we used in this analysis and Martin and Dongying were both working on that project.

    After the Sargasso paper was published in 2004 though, I continued to fester about the RecA trees. And I wondered – if instead of trying to classify bacterial sequences into phyla, what if I tried to look for RecAs, rRNAs and other genes that were completely new branches in the tree of life? I got the chance to start to play with this concept again when Venter and crew asked me to help analyze the data coming out of the Global Ocean Sampling project. Again, this project was very exciting and interesting.


    As part of the project, I helped Shibu Yooseph and others look into whether the GOS data revealed any completely new types of functionally interesting genes, much like I had shown for proteorhodopsin in the Sargasso data.  


    Figure 7 from Yooseph et al. 2007 . Phylogenies Illustrating the Diversity Added by GOS Data to Known Families That We Examined 






    And again my mind started wandering towards the question of “OK – so – if there are all these very unusual and novel functionally interesting genes, what about looking for unusual and very novel phylogenetic marker genes”? So finally, I got back to work on the issue.

    And so I built a better RecA tree by first pulling out all possible homologs of RecA and RecA like proteins from the GOS data and then building an alignment and a tree. And there they were. Some very f*%&$ novel RecAs – distinct from any previously known RecA like proteins as far as I could tell. And so with help from Dongying and the JCVI crew, we started building a story about novel RecAs. And then we looked at RpoBs. And found novel ones too. And in mid 2006 while Shibu and Doug worked on their papers that were to be submitted to PLoS Biology and I worked on a review paper too, I told Emma Hill (who has since changed her name to Emma Ganley due to some sort of wedding thing) at PLoS Biology about the an analysis that was consistent with the existence of a fourth domain of life. No overstating our findings really – just that we found very novel phylogenetic marker genes. And that I was working on a paper on it. But alas I never got it done, though I was happy to have convinced Venter to send the GOS papers to PLoS Biology and I think the papers that came out were good. Among the papers were my review (Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes, Doug Rusch’s diversity paper The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through Eastern Tropical Pacific and Shibu’s protein family paper The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families as well as many others as part of the Ocean Metagenomics Collection at PLoS.

    And in the midst of all of this, we had our first child and we wanted to move back to Northern California to be closer to family (my wife’s family is all in the Bay Area and my sister and brother Michael were in N. Cal too). So I applied for jobs and eventually took at job at UC Davis and we moved to Davis. Needless to say, all of that put a bit of a crimp in my work productivity. And once I was up and running at Davis, it just took a long time to get back to the searching for novel deep branches in the tree of life. But finally, we did it (with periodic prodding from Craig Venter). And we put together a paper and got it submitted to PLoS One in October. The reviews were very positive and enormously helpful. And we finally got a revision in January and it was officially accepted in February 2011. Only some seven years after my first work on the project. Whew.

    More detail of what is in the paper
    Well, I am going to be posting here some additional detail on what is in the paper.



    Why we punted on analysis of very novel rRNAs.

    The problem with rRNA is that the sequences that come from environmental samples are not complete (i.e. they only correspond to portions of the rRNA genes). Unfortunately, this makes a key step in phylogenetic analysis difficult – the alignment of sequences. We actually found about 200 rRNA sequences that seemed unusual in a phylogenetic sense. However, we were not convinced that the alignments of these fragments to other rRNAs was robust. This is because the alignment of rRNAs is best done making use of the base pairing secondary structure of the molecule and not the base sequence (i.e., primary structure).

    With only rRNA fragments, we could not use the secondary structure to do the alignments because you need to whole molecule to determine the best folding. Combined with the fact that we were searching for very distantly related ribosomal RNAs which would be hard to align even if we had the whole molecule, we were stuck for a bit. It seemed impossible to look for really novel organisms.
    So that is when we turned to other genes. The key for this is that there are protein coding genes that are universal and that for known organisms show similar patterns to rRNA in trees. In fact, in 1995 I wrote a paper showing that trees of RecA were very similar to trees of rRNA. RpoB is also considered a very robust phylogenetic marker. For organisms that we have in the lab (i.e., cultured) – many people use these other genes for phylogenetic analysis. rRNA has been very important in part because of the ease with which one can PCR amplify it from environmental samples and the fact that it is very hard to PCR amplify protein coding genes from the environment. Metagenomics changes this. With random sequencing, you get data from all genes. This means we can pick and choose genes to analyze for phylogenetic analysis and do not have to rely on rRNA.

    So we went after RecA first, because it has been shown to be a good phylogenetic marker for studies of the tree of life. And we found some very novel branches in the RecA tree. And after analyzing these and convincing ourselves that they were indeed phylogenetically very novel we went after RpoB. And also found very novel branches.

    So the phylogenetic analysis I think is very robust.

    RecA and RpoB as phylogenetic markers

    Many genes have been used as alternatives to rRNA genes to build “Trees of Life” including all organisms. Each has their own flavors of advantages and drawbacks. Two commonly used ones are the RecA and RpoB superfamilies.

    The many possible explanations for finding novel forms of phylogenetic marker genes

    The phylogenetically novel phylogenetic marker genes we found could have many explanations including that they could be ancient paralogs of these genes (but not found in any genomes we have available), they could be from viruses, or they could be from a novel branch on the tree of life. Or our trees could be bad. We think the latter is somewhat unlikely as our analysis has many lines of support. For example our RecA trees are very similar to those from a comprehensive study from M. Nei’s lab except they did not include the metagenomic data. But I guess it is still a possibility that our trees are biased in some way (e.g., by long branch attraction or bad alignments)

    Follow up analysis and rapid posting via Google Knol

    Amazingly and a bit sadly, I think we rushed the paper out. We left out one thing partly by accident – we had done an analysis of the locations from which these novel RecA and RpoB sequences had come. And somehow, in our final push to get the paper out, we left this out. I will be posting this information as soon as possible here and on the PLoS One site.

    In addition, after submitting the revision of our paper, we realized that we might be able to do a deeper analysis on one aspect of the work – how RpoB homologs from unusual DNA viruses compared to our novel sequences. We had included some RpoBs from DNA viruses in our analyses but not all that were available. So Dongying Wu did a very rapid additional analysis, adding some additional RpoB homologs to our alignment and making a tree of them. We then wrote a Google Knol about this new tree and submitted the Knol to PLoS Currents “Tree of Life” where it is currently in review. We are publishing the preprint of this Knol to make it available to all even while it is in review.


    Figure 2 from Wu and Eisen submitted. 

    Data availability

    There is a move afoot to make sure all data/tools associated with publications are readily available. We used publicly available sequence data and as much as possible publicly available tools for our work . We are trying to release as much as possible to allow people to re-analyze our work and to do any of the work themselves. We have therefore made use of the Dryad Data deposition service to post some of this material (see http://datadryad.org/handle/10255/dryad.8385).

    Who was involved

    • Dongying Wu a brilliant “Project Scientist” in my lab led the project (Project Scientist is one of the UC positions that is like what others call “Senior Scientist”). Dongying is simply one of the best bioinformaticians/computational biologists I have ever met. He was first author on many key papers from my lab including the Genomic Encyclopedia paper that came out last year and the glassy winged sharpshooter symbionts paper that came out a few years ago. Dongying worked in my group at TIGR and moved with me to UC Davis and currently splits his time between UC Davis and the DOE Joint Genome Institute. 
    • Martin Wu. Martin is an Assistant Professor at the University of Virginia. Prior to that he was a Project Scientist in my lab at Davis and a post-doc in my lab at TIGR. He is also a phenomenal bioinformatician / computational biologist. He developed the AMPHORA software in my lab and also led many genome projects (back when sequencing a genome was hard …) including that of the first Wolbachia genome and that of a very unusual bug Carboxydothermus hydrogenoformans. Martin helped with some of the genome analyses as part of this work. 
    • Aaron Halpern, Doug Rusch and Shibu Yooseph are all bioinformaticians from the J. Craig Venter Institute (Aaron is no longer there). All three helped with different aspects of dealing with and analyzing the GOS data and all three have been remarkably patient as this work dragged on and on. 
    • Marv Frazier from the JCVI was helpful in the initial set up and conceptualization of the project. 
    • J. Craig Venter is, well, Craig Venter, and he was involved in multiple aspects of the project including thinking about how and where to look for unusual sequences and interpreting some of the results.

    UPDATE: Funding for this work

    Most of my labs early work on this project was supported by a grant we had from the Assembling the Tree of Life program at the National Science Foundation (grant 0228651 to me and Naomi Ward). In that project we were working on sequencing and analyzing genomes from phyla of bacteria for which genomes were not available at the time. As part of this work we were designing methods to build phylogenetic trees from metagenomic data because we thought that our new genomes would be very useful in helping analyze metagenomic reads and figure out from which phyla they came. Later work on the project was supported by a grant to me, Jessica Green and Katie Pollard from the Gordon and Betty Moore Foundation (grant 1660).

    Some questions that might be asked and some answers (based in part on questions I have gotten from reporters). Note if you have other questions please post them here or on the PLOS One site for the paper.

    • Why no press release? Well, in part, because I sent information too late (shocking I know) to the Davis Press Office. But also because they have gotten suddenly busy with some Japan earthquake related actions. But also because, well, I really hate a lot of press releases. And finally, my brother had dinner with Carl Zimmer recently and apparently they discussed the possibility of having no press releases associated with papers. So here goes …. 
    • Really – what took so long? I would like to say the US Government made us hold back on publishing this until they could look into whether Venter collected ocean data from Roswell, NM or not. But really, the story above is true. We just did not get it done earlier. 
    • Why do you not know the source of the DNA (i.e., cells, viruses, etc)? This is why there was a six year wait between discovery and writing this up. We kept thinking we would be able to find the organisms but since I moved from TIGR and started a new job, we just never got around to getting to the source. We therefore decided to open this up to others who will hunt for the source by writing up the paper. 
    • Why did you not rename the Unknown 2 group in the RecA tree? We should have renamed our group “Thaumarchaeota” or something like that. When we did the initial analysis our group was novel. And then a few years ago a few groups obtained data from what is thought to be the third major lineage of Archaea – referred to by some as Thaumarchaeota. This is to go with the Euryarchaeota and Crenarchaeota. See http://www.ncbi.nlm.nih.gov/pubmed/20598889 for example. 
    • One of the clades in the RecA tree (XRCC2) seems out of place phylogenetically. I can see how that is confusing. The XRCC2 clade is very weird and hard to figure out. It is not the “normal” eukaryotic genes – those are the Rad51/DMC1 genes. One complication with the RecA family is that there have been duplication events to go with the species evolution. And thus eukaryotes have Rad51, DMC1, Rad51B, Rad51C, Rad57, XRCC3 and XRCC2. We tried to figure out where the XRCC2 group should go but it just was hard to place. The statistical support for its position (we used a method called bootstrapping) is low (note the lack of a number on the node where the branch leading to XRCC2 connects to the base of the tree). Most likely that group should be placed with some of the other eukaryotic groups. However, it seems likely that there was a duplication in the lineage leading up to the ancestor of eukaryotes and archaea (some studies have indicated they share a common ancestor to the exclusion of bacteria). Such a duplication would explain why basically all archaea have a RadA and and RadB and all / most eukaryotes have multiple paralogs as well. 
    • The Unknown 1 group in the RpoB RecA tree seems to group with phage. What can you say about that? We think unknown 1 is potentially of viral origin but still cannot tell. The fact that is clusters with RecA superfamily members from phage suggests this but it is distant enough from known phage for us to not be confident in any predicted origin. As for derivative forms vs. independent branch – this is one of the big questions about viruses these days. Many viruses encode homologs of “housekeeping” genes found across bacteria, archaea and eukaryotes. And in many cases the viral versions of these genes appear to phylogenetically very novel. This is why the people studying mimivirus (which we refer to) suggest some viruses may in fact represent a fourth branch on the tree of life. It is possible that some viruses are in fact reduced forms of what were once cellular organisms – akin to parasitic intracellular species of bacteria possibly. 
    • Why are these phylogenetically novel sequences so low in abundance? This is a key question. I think it would be easy to come up with a theory for these being rare or these being common. They might be rare if their niche is very limited today. Or they might be rare because they could not be very competitive with other organisms. Or they could be rare because they require some unusual interactions with other taxa. In addition, we have only looked carefully at ocean water samples. If these are common somewhere else (e.g., hotsprings, deep subsurface, etc) we would not yet have figured that out. We are looking at additional metagenomic data right now to see fi we can find any locations where relatives of these genes are more common

    Some related papers by others worth looking at

    Some related papers by me possibly worth looking at

    Some related blog posts I have written over the years

      http://friendfeed.com/treeoflife/5535e8ed/story-behind-of-my-new-plosone-paper-on-stalking?embed=1

      Dongying Wu, Martin Wu, Aaron Halpern, Douglas B. Rusch, Shibu Yooseph, Marvin Frazier,, & J. Craig Venter, Jonathan A. Eisen (2011). Stalking the Fourth Domain in Metagenomic Data: Searching for, Discovering, and Interpreting Novel, Deep Branches in Marker Gene Phylogenetic Trees PLoS One, 6 (3) : 10.1371/journal.pone.0018011

      I know – Ego Blogging is so 2010 – But I won. I won. I won. (The Ben Franklin Award …)

      OK so the title is a bit much. But I am really happy that I won this years Benjamin Franklin Award, given out by the Bioinformatics Organization. For more on this see …
      I found out a few days ago and am rearranging some things to go to Boston April 13 for the award ceremony at the Bio-IT World Conference and Expo.  
      From the Bioinformatics Organization web site:

      Benjamin Franklin (1706-1790) was one of the most remarkable men of his time. Scientist, inventor, statesman, he freely and openly shared his ideas and refused to patent his inventions. It is the opinion of the founders of the Bioinformatics Organization, Inc. that he embodied the best traits of a scientist, and we seek to honor those who share these virtues

      The Benjamin Franklin Award for Open Access in the Life Sciences is a humanitarian/bioethics award presented annually by this organization to an individual who has, in his or her practice, promoted free and open access to the materials and methods used in the life sciences.

      I like the general sentiment very much.  And perhaps more important – the list of prior winners is an impressive crew.  Again, from the Bioinformatics Organization web site:

      Note – my brother won the first one.

      Anyway – am thinking about what to say in the awards ceremony.  Probably going to say something about how openness is more than about being at no charge.  Also I might discuss how it would be good to have a female winner one of these days.  Speaking of which – maybe people can give suggestions for women to nominate for next year …

      UPDATE 9/25/12: See this Friendfeed discussion for some more comments about possible female candidates. I have copied the text below in case Friendfeed disappears: “maybe people can give suggestions for women to nominate for next year …”. OK, I’ll start: how about Rosie Redfield? If it weren’t for the Life Sciences focus I’d also suggest Heather Joseph. Speaking of Heathers, one H. Piwowar springs to mind whenever Open Foo is mentioned. – Bill Hooker heh, that would be cool someday 🙂 For now, how about Helen M. Berman, Judith A. Blake, Maryann E. Martone, Catherine Ball, or other pioneers in open databases? – Heather Piwowar Janet Thornton. – Heather Piwowar Agreed! – Egon Willighagen In an award speech at ISMB 2005, Janet Thornton expressed gratitude she was able to take years out-with-family and then pick up again. Inspirational. Not relevant for the Ben Franklin award, but wanted to mention it because it made such an impact. – Heather Piwowar

      J. Craig Venter Institute, UCSD, Beyond the PDF, #UCDavis leadership, all in one trip

      A wee bit late but thought I would give an update on a recent trip.  In January I went on a little trip to Southern California.  The trip started with a simple plan – Susan Golden invited me to give a talk at UCSD.  After my usual complications in planning, I finally agreed to a date (1/19) after finding out the Phil Bourne, Editor in Chief of PLoS Computational Biology was helping organize a meeting starting 1/19 entitled “Beyond the PDF” to discuss the future of scientific publishing.  So this seemed like a perfect mix.  Go down to UCSD for one thing and stay for another.  Short flight and easy to change.  Seemed ideal

      Of course I had to make it more complex so I contacted some friends at the J. Craig Venter Institute to see if they would be around and unfortunately my friend Jeff Hoffman was not going to be around.  But he connected me to Craig Venter and his wife Heather Kowalski and though what I really wanted was to just see if they would be around for a visit or dinner – I ended up getting roped into giving a talk there on the 18th.

      https://picasaweb.google.com/s/c/bin/slideshow.swf

      I note – for those not in the loop – I worked at The Institute for Genomic Research (TIGR) for eight years or so before moving to Davis.  TIGR was founded by Craig and Craig was the head when I interviewed for a job there in 1998.  But between when I interviewed and when I showed up, Craig had left to start Celera, and his then wife Claire Fraser took over.  The entire time I was there, until 2006, Claire was the president of TIGR.  However, I did work with Craig on and off on various projects over the years and also started to work with many of the people at the J. Craig Venter Institute.  When Craig left Celera there was a lot of tension between TIGR and JCVI and things got a bit nasty at times.  Someday I will write more about my thoughts on all that went on but for the purposes of this post, all that is needed is to say that I always got along well with many of the people at the JCVI, including Craig.

      Day 1 (Jan 18) – Davis to San Diego to JCVI 
      So after even more complexities in planning I had a plan.  I flew down on the morning on the 18th from Sacramento, and took a cab to my hotel (The Estancia).  Just staying at the Estancia was a bit of a complication.  You see, originally I has asked to get put up at the La Jolla Shores hotel because it is on the ocean and well, we don’t get much ocean in Davis, CA.  Plus my kids were going to come and they wanted to stay near the water.  Then my wife and kids bailed and so I now had a little less reason to stay at the La Jolla Shores.

      So I asked the assistant who was coordinating travel to switch me to the Estancia. Alas, she told me they were not having any visitors stay at the Estancia because there was no food available there.  This seemed weird and after asking around and then even calling the hotel I found out there were two restaurants there – one open for breakfast and lunch and another open for dinner.  So I told this to the assistant.  She then told me that was not true.  I had to basically beg to get moved to the Estancia.  I wanted to be there because it was walking distance to UCSD and was where many of the people for the Beyond the PDF meeting were staying.  Anyway – I finally got a reservation there.

      So to continue – I headed to the Estancia from the airport.  I dropped my stuff and I tried to mooch a ride from some of my friends/contacts at J. Craig Venter Institute but they were not answering.  So I took a cab.  I got there, had a decent chat with Bob Friedman, and then went to set up to give my talk. Got set up, Craig came in with his new dog Darwin, and I talked.

      Here are the slides (with audio but not yet synched as I write this)

      After my talk I met with a few people around JCVI including Andy Allen and Roger Lasken.  Got a tour of some of their toys there.  And I saw a variety of old friend.  And then I went out to dinner at Zenbu Sushi with some of the crew there.  Ham Smith gave me a ride.  As usual, he was driving a very long American car.  He is quite tall.  But he also seems to like the classic American extra long cars.  We drove through traffic in La Jolla and talked about microbes and California.  Then we got to the restaurant, where eventually Clyde Hutchinson, Craig and Craig’s wife Heather showed up and we had a very nice dinner.  I then mooched a ride back to my hotel with Ham.  I note Ham gave me some grief about my recent haircut as I noted on twitter later “Off to Salk/UCSD this AM – Tues spent PM at J. Craig Venter Inst.: gave talk, saw cool things/people & got dissed on haircut by Nobelist”

      When I got back to the hotel I found that some of the people who were in town for the “Beyond the PDF” meeting were at the bar.  So, instead of working on my talk, I went to the bar and hung out with some of the publishing folks.  And finally I crashed.

      Day 2 (Jan 19). UCSD.

      I got up early in the AM and had breakfast at the hotel (yes, indeed, they had food there).  I thought I saw the UC Davis Chancellor Linda Katehi at breakfast but figured I must have been seeing things.  Then I walked from the hotel all the way across the street to the Salk Institute where my first meeting of the day was.  It was so close that I had 30 minutes or so to kill so I walked down towards the beach.  I believe I got close to it but it was so foggy I could only see about 30 feet in front of me and though I could hear lots of waves crashing I did not actually see the beach.

      I then returned and found my way to Joe Ecker‘s office.  Had a great meeting with Joe (he does just phenomenally cool stuff on Arabidopsis and apparently on stem cells now too) and I have known him for many years since working together on sequencing and analyzing the Arabidopsis genome (I helped in analysis of the genome when I was at TIGR) (I note – the genome paper was supposed to be freely available forever at Nature’s web site but as I write this it is not free).  Then my host, Susan Golden picked me up at Joe’s office and we walked, in partial silence (she had laryngitis) from Salk to her lab.  Susan works on cyanobacteria and has done some fascinating work on circadian rhythms in these species.  I spent an hour or so with her lab in their lab meeting talking about science and then went off to lunch.  I had lunch at the UCSD Faculty Club with Larry Smarr who I have interacted with in a variety of ways for many years.  We spent most of lunch talking about personal data recording (e.g., medical tests, real time monitors, etc) (see one of his talks about his own personal data here).  We even went back to his office afterwards and I got to see some of his personal data and how he has ben trying to integrate genomic information with medical records and lab tests.  Afterwards I drifted back to Susan Golden’s office, called her up and she met to take me to my next meetings with Joseph Pogliano and then Kit Pogliano.  Both are doing very very cool experimental microbial studies that overlap a bit with some of the things my lab has studied (e.g., Joe works on bacterial actin like proteins and Kit works on sporulation).  After meeting with them Susan and I then headed over to the seminar room where I had 30 minutes or so to get my thoughts in order and then gave my talk.  It was VERY similar to the talk I gave at the Venter Institute, but cleaned up a little bit with the Venter/TIGR jokes removed.

      After my talk, Susan and her husband Jim drove me to dinner where I was very pleased to find out Susan had tracked down a friend of mine from grad. school, Kristin Baldwin, who was now on the faculty at the Scripps Research Institute.  It was great to see Kristin for the first time in 15 or so years.  After dinner I went back to the hotel and bumped into Pam Ronald (a friend and colleague of mine from Davis) checking into the hotel – she was in town for the Plant and Animal Genomics meeting.  I then went to the bar and discovered the entire Beyond the PDF meeting crew there.  I lingered a bit and then finally went to sleep.

      Day 3 – UC Davis leaders and Beyond the PDF. 1/20

      Running out of steam here.  So this section will be a bit shorter than the other days.

      The highlight of the day was just after breakfast.  I walked out of the hotel to head over to campus and this time I was certain that I saw the new UC Davis Chancellor Linda Katehi there.  So I went up and said hello, reminding her who I was (we have met a few times now, but you never know).  She introduced me to her husband and to the new UC Davis Provost, Ralph Hexter.  I asked what they were doing at UCSD and they said there was a UC Regents meeting.  Not one to miss a chance to hang out with the UC Davis leaders, I walked with them to campus.  I spent the whole walk chatting with Hexter, who I note, was very very impressive.  I must say, I am a massive fan of Katehi.  Every interaction I have had with her has left me enormously impressed.  And I really like what she is trying to do at Davis.  And one other thing that impresses me is who she has been hiring into leadership positions at Davis.  Hexter seems perfect for a provost right now at Davis.  Humanities prof.  Ex university president (Hampshire College) and ex-Dean of UC Berkeley.  I talked with him for about 25 or so minutes on the walk and was left thinking UC Davis is in good hands.  Another recent hire at Davis is the Vice Chancellor for Research Harris Lewin who I am also very impressed with.

      Anyway, after walking with them to the Regents meeting I then headed off to the Beyond the PDF meeting.  Since I am running out of steam here I call your attention to this sites with more information about that meeting: Beyond the PDF.  The meeting that day was OK.  Saw / met lots of interesting people.  The best part was hanging out with people like Kay Thaney who I never get to see enough.

      this one goes out to @David_Dobbs. open science friends unite... on Twitpic

      And then went to dinner at the La Jolla Shores Hotel.  And went back to the Entancia and went to sleep.

      Day 4 (1/21): Beyond the PDF and home

      Well, completely out of steam now.  So all I am going to say is that I went to the Beyond the PDF meeting for the AM and then headed off to the airport to go home.  I think two talks/visits, plus one workshop was a bit much for my brain to handle.  Thus I am only now getting to writing up some notes.

      ————–

      Please help keep the pressure on Nature Publishing Group to restore free access to genome papers #opengate

      Well, I realize of course some things take time, but I cannot imagine it is that hard to restore free access to all papers reporting genome sequence data.  Nature had promised to do this when many papers were published but recently I noticed that this was not being done.  For some background see:

      So today I browsed around to see if access had been restored to these genome papers.  And alas they had not for many. For example, the Plasmodium genome paper is not available

      The Shewanella genome paper is also not available.  I know things take time.  But I note, I have pointed out failings in the free access previously to Nature and it was seemingly fixed but not permanently.  They really need to fix their system so that this stops happening.  So I am going to keep at them.  A bit tongue in cheek I have called this #opengate but perhaps I should call it Openomics?  

      Upcoming conference: DOE-JGI User Meeting: Energy, Genomes, Environment, Microbes, Trees and more …

      Just got this email I thought would be useful to share (with a few edits)

      Still time to register for JGI “Genomics of Energy & Environment” 6th Annual User Meeting

      The DOE Joint Genome Institute (JGI) hosts the “Genomics of Energy & Environment” 6th Annual User Meeting (March 22-24, 2011):
      Here

      Explore the agenda of presentations, poster sessions, and valuable hands-on workshops (see below).

      The meeting brings together, at the Walnut Creek Marriott, people working on:

      • Synthetic Biology
      • Ecogenomics and Ecoresilience of the Gulf Oil Spill
      • Hardware and Software Trends in Genomics Supercomputing
      • Computational Approaches to Massive Short Read Metagenomic Data Sets
      • Genomics of Biofuel Crops
      • Behavioral Genetics of Pollinating Bees
      • Microbiome Analyses from Humans to Shipworms
      • Metatranscriptomics of Marine Microbial Communities
      • Successful Transposable Elements Secrets
      • Great Prairie Soil Metagenomics

      Confirmed speakers include:

      • Peer Bork, (European Molecular Biology Laboratory) EMBL
      • Ed Buckler, Cornell University
      • Dan Distel, Ocean Genome Legacy
      • Persis Drell, SLAC National Accelerator Laboratory/Stanford
      • Terry Hazen, Lawrence Berkeley National Laboratory (LBNL)
      • Scott Hodges, University of California, Santa Barbara
      • Tom Juenger, University of Texas at Austin
      • Rob Knight, University of Colorado
      • Ruth Ley, Cornell University
      • Mary Ann Moran, University of Georgia
      • Magnus Nordborg, Gregor Mendel Institute
      • Gene Robinson, University of Illinois at Urbana-Champaign
      • Christopher Scholin, Monterey Bay Aquarium Research Institute (MBARI)
      • Stephan Schuster, Penn State University
      • Pam Silver, Harvard
      • Jim Tiedje, Michigan State University
      • Mike Thomashow, Michigan State University
      • Jerry Tuskan, Oak Ridge National Laboratory/DOE JGI
      • Katherine Yelick, National Energy Research Scientific Computing Center (NERSC) at LBNL

      Workshops will include:

      • Integrated Microbial Genomes (IMG)/Metagenomes data analysis systems http://img.jgi.doe.gov/
      • Mycocosm fungal genomics portal provides data access, visualization, and analysis tools for comparative genomics of fungi http://genome.jgi-psf.org/ programs/fungi/index.jsf
      • Phytozome provides data access and visualization tools for comparative plant genomics http://www.phytozome.net/
      • RNA Technologies & Analysis: a comprehensive suite for transcriptome interrogation, including RNA-Seq for expression profiling, etc.

      – Posted using BlogPress from my iPhone

      Interesting take (though not completely convincing) take on NEJM farm microbe story

      There is an interesting take in Forbes on a recent study NEJM on microbes and people growing up on farms.

      Not totally convinced of the opinion of the writer, but with a look.

      – Posted using BlogPress from my iPhone

      Calling for Nature Publishing Group to return all money charged for articles that were supposed to be free #OpenAccess

      Well, in case you did not see, yesterday I got really pissed off at Nature Publishing Group.  Short summary – many articles of mine that were supposed to be freely available on their journal sites were not.  For more information see

      People from Nature Publishing Group have responded quite quickly saying they will look into this and try to fix it and indeed they have fixed many if not all of the mistakes in accessibility I found yesterday.  Glad they responded so quickly.  However, their response raises quite a few questions.  Like “what happened?” – as in – why did access get closed off?  And why were they charging to for article use when they should not have been?
      It would be good for Nature to publish / post a full description of what went wrong.  And perhaps they will.  Apparently, it was just a glitch in the system.  Whatever the cause however, almost certainly some people paid for access for articles that were supposed to be freely available.  I am calling on Nature here to audit their systems and return all money that was paid for such access.

      Today is a day to be annoyed with Nature (Publishing Group that is) #NatureFail

      Yuck.  Am getting really pissed off right now.  It is 1:30 AM.  I am tired.  And I am now angry.  I was writing a post about a recent trip, and wanted to link to an article I was a coauthor on.  The article was the paper on sequencing and analysis of the genome of Arabidopsis thaliana.  So I googled “Arabidopsis Genome Initiative” and found the link to the paper at Nature.  And much to my surprise I found this waiting for me:

      Why is that a surprise?  Because the genome paper is supposed to be freely available to all forever, under a policy Nature developed for papers reporting new genome sequence data.  I am tired or I would write more about the history of this.  But another time.

      So then I looked for other genome papers I have published in Nature.  And so I looked for the Plasmodium genome paper.  And I got this:

      Grand.  That one was supposed to be free forever too.

      And so I looked at many others.  And for most, I got the same thing.  Not freely available.  If I were not at home, I would not have noticed this because I have access at work.  And I could get access at home by setting up the UC Davis library VPN system.  But fortunately I do not do that or I would not have discovered that Nature, not for the first time mind you, has turned articles that were supposed to be freely available forever into charge for access articles.  I know.  I know.  This is probably just some glitch in their system.  They really do seem committed to trying to make these available.  But clearly, the system either does not work well.  Or they are not committed to it.  Either way this is really annoying.  In some cases, the papers were sold to communities of scientists in part with the “These will be freely available to all forever” line as part of the sell.  I am deeply worried about my recent Genomic Encyclopedia paper which is also supposed to be freely available forever.  Right now it still is, which is good.  But how long will that last?  And I note, though Nature people have said they would try and fix it, Nature still incorrectly claims Copyright to that article on the PDF.  Personally, I like most of the people I know at Nature Publishing Group and like many of the things NPG does.  But this is getting really annoying.  And it just goes to show – the ONLY way to go it seems is full, complete Open Access which the journals cannot magically then take away.

      Norman R. Pace visit to #UCDavis; discussing microbiology of the built environment #microBEnet

      Norman R. Pace, from UC Boulder, gave a talk at UC Davis last week about microbial diversity.  In his talk he discussed some of his recent Sloan Foundation funded work on “microbiology of the built environment” including studies of shower heads, indoor swimming pools, water supplies, and hospitals.

      Pace is one of the pioneers of DNA based studies of microbes in the environment.  His initial work on studies of ribosomal RNA from uncultured organisms (started more than 20 years ago) helped launch the field.  For more information on his work see his lab page here

      If you are interested in the microbes that are found in showerheads, his PNAS paper on this (which can be found here) on this from 2 years ago got a lot of press.  See for example this Science Friday
      and this New York Times article by Nicholas Wade.

      Pace was at UC Davis as part of the Storer Major Issues in Modern Biology Lecture Series.

      I note, I have written about Pace before a few times including this:
      Here’s hoping molecular classification/systematics of cultured & uncultured microbes wins #NobelPrize in medicine

      I note we have a new project as part of this Sloan program to facilitate communication and networking and sharing information as part of this project.  My lab is creating something called “microBEnet” – the microbiology of the built environment network.  We are just getting our real site up and running.  For now you can find out some information at a temporary page http://microbenet.blogspot.com/

      Arsenic revisited: discussing arsenic story with a #UCDavis biology writing class next week

      Well, this could be fun. Next week I am making a guest appearance in a Writing class at UC Davis. The class focuses on writing in Biology and the instructor invited me to come in as a guest to coordinate a discussion of the arsenic paper and the coverage of it.
      When the instructor asked for reading assignments I said they should read:

      I think I probably should have suggested they read Zimmer’s excellent full write up here.  Going to suggest that now but may be too late.

      Any other pointers to good write ups of what has happened since the first week after the paper would be appreciated.

      ————-
      Some suggestions coming in from twitter: