Harvard’s Moving To Open Access – Let’s Use this to Push for OA at other places

Well, Harvard is frequently criticized for being a bit conservtive in responding to new ideas and initiatives. But it seems that recently Harvard is more like a oceangoing yacht than an oil tanker. And yesterday, the New York Times reported on a proposed new initiative that could make Harvard a leader in the movement towards “Open Access” publishing.

The Times reports

“Faculty members are scheduled to vote on a measure that would permit Harvard to distribute their scholarship online, instead of signing exclusive agreements with scholarly journals that often have tiny readerships and high subscription costs.”

and

“Under the proposal Harvard would deposit finished papers in an open-access repository run by the library that would instantly make them available on the Internet. Authors would still retain their copyright and could publish anywhere they pleased — including at a high-priced journal, if the journal would have them.”

In my opinion, there is no doubt this is a smart move. Sure, there are some potential downsides to open access. Some journals do good things and they may have to reinvent themselves to continue to bring in revenue. But welcome to the 21st century. It is not like other industries – like music and TV and movies and electronics and so on – have not had to reinvent themselves.

And the result are in — Harvard approved the initiative (see here for example). Now – I think we should use this as an example to get other institutions to do the same thing. As reported in the Boston Globe, Harry Lewis a CS Professor at Harvard said:

“Harvard is in a unique position to do the right thing in the academic world,” he said. “In this case, I think others will be emboldened by Harvard to follow its lead, and the course of collective action will be greater than the course any individual school will take.”

I will do my best to get UC Davis to do the same thing, but given the animosity towards open access exhibited by our acting provost Barbara Horwitz, it may be a tough ride here. Fortunately, they are interviewing candidates for provost now and hopefully whomever they pick will be more supportive.

So – here is a call to others out there. Push for the same type of thing at your institution. I will be posting more on this in the coming days/weeks. Maybe collectively we can follow Harvard’s lead on this and make Universities more about what they are supposed to be about – spreading knowledge.

Creating Mitochondria and a sign that we need open peer review

Well, since everyone else is posting about this I figured I should too (see for example Steven Salzberg’s Blog, The Harvard Crimson, Pharyngula). If you have not heard yet, there is an article in the journal Proteomics that discussed how mitochondria must have been created by an intelligent designer.

For example on p8 the authors say:

“Alternatively, instead of sinking in a swamp of endless
debates about the evolution of mitochondria, it is better to
come up with a unified assumption that all living cells
undergo a certain degree of convergence or divergence to or
from each other to meet their survival in specific habitats.
Proteomics data greatly assist this realistic assumption that
connects all kinds of life. More logically, the points that show
proteomics overlapping between different forms of life are
more likely to be interpreted as a reflection of a single common
fingerprint initiated by a mighty creator than relying on
a single cell that is, in a doubtful way, surprisingly originating
all other kinds of life.”

Say what you want about the journal Proteomics but boy did they screw this one up. I think they probably should have caught this without much effort but who knows exactly what happened. In all fairness to them, it is possible for weird thin gs to slip through at any journal. Reviewers are busy. Editors are busy. Everyone is busy. How can we prevent this from happening again. There is a simple change we could make that would help. It is called Open Peer review. That is, if reviewers names were publicly attached to papers they reviewed, and their reviews were published, we would be less likely to see things like this happen. Then, if someone agrees to do paper review, they would be careful about it. Sure, we would probably have a harder time getting reviewers, but that would be better than publishing crap.

What are the risks with Open Peer review? Well, some people might feel afraid to criticize others especially people with power. Well, I find this sad. Scientists criticize our collaborators and friends ALL the time in private. Why not be public about it? Aren;t we supposed to be searching for the truth? If we are, shouldn’t we be willing to give our opinions in public forums?

Marco Island – Saving Some of the Best for Last

Well, the Marco Island AGBT meeting just wrapped up and I they definitely saved some of the best for the end. I do not have a ton of time but here are my favorites:

Stephan Schuster gave a funny, entertaining and interesting talk on mammoth mitochondrial genomics. He riddled the talk with funny stories and one liners about his efforts to sequence old DNA and to publish the results. He did a good job of showing why Roche-454 sequencing is quite ideally suited for studies of old DNA.

John Leamon from Raindance Technologies summarize some of their work on droplet based microfluidics. He showed videos of droplets moving through their system and showed how it sould be used for various digital PCR-like activities.

But it was the last talk of the whole meeting that really did blow my mind. It was from Steve Turner from Pacific Biosciences. He presented an overview of their sequencing technology as well as a tiny bit of data. Now, normally I am uninterested in marketing talks where little data is presented. But this talk was different. First, their technology clearly has enormous potential for revolutionizing the sequencing field. Basically, what they are doing is reading the activity of a DNA polymerase as it replicates a single DNA molecule and they do it in real time. He referred to this as using the DNA polymerase as a sequencing engine and then he took the crowd through the details of the technology and some of the modifications they have made to make it work better. I will try to post later with more detail on their methods.

No – they are not quite ready for prime time yet. But the potential is pretty absurd. I see two key major advantages of their method if it can be fine tuned to work well — (1) it is screamingly fast – because they let the polymerase do all the work in essence (2) it can potentially get long reads — right now he claimed they could do up to 1500 base pairs and theoretically it could go much higher. If they can get this to work with reads of 10,000 bases for example, this will completely reconfigure the field. No longer will one have to worry about the complexities of mapping short reads as with many of the current “new” methods. And the long reads, coupled with many molecules per run, plus the high speed, this technology is the first I have seen that has shown some results and that could really lead to the $1000 human genome. Again, not clear when/if they will be ready for release to the world so don’t hold off buying one of the other systems that currently work (i.e., Illumina, Roche, ABI Solid) if you want to do “next gen” sequencing. But this company is one to keep an eye on.

NOTE – SEE ALSO

Coolest Thing at Marco Island – The Polonator


Without a doubt, the coolest thing at AGBT/Marco Island is the Polonator. This is a new sequencing system build by Denaher Motion based on the polony method from George Church’s lab. Now I have no idea if this machine even works. And even if it works, I have no idea how useful it will be. But the idea is brilliant and appealing. They are trying to be an “Open Development” Massively High Throughput Sequencing system. By Open they mean, they will use open source software (and would love developers to help) and will use non proprietary reagents and supposedly try to make everything as cheap as possible. They will still make the main piece of equipment and I assume this is what they hope to make money off of.

I went to a showing in a room nearby the seminar room. There were many skeptics there. Most were concerned about the read lenght coming from the machine. But it should get better. And if the price is a lot less for reagents and the machine than anyone else’s system — this could be an important player in the market.

//www.youtube.com/get_player

See also

More notes from Marco Island/ AGBT

Some notes on talks here:

My favorite talk yesterday morning was David Cox from Perlegen. He had as usual some good one liners including “Everybody and their mother is doing this so doing this is not so novel. What is novel about it is that it worked.” I should add that David Cox helped shape my career indirectly in many many ways. When I was a PhD student at Stanford, I got into genomics in part by teaching a course with David Botstein, Rick Myers and David Cox. When Craig Venter offered me a job at TIGR in 1998, I was not sure if moving to a non university was a good idea or not. So I asked many people for their opinions. Some said “You must do an academic post doc or you will never get a faculty job” I pretty much knew to ignore those folks. Cox gave the best advice. He said as long as I published things while at TIGR, it would not hurt me in any way. It probably would help. And so I took the job. And no doubt that was a great career move.

Other talks that were good were one by Joe Ecker, who discussed methylation in Arabidopsis and one by Andy Clark.

I skipped out on some of the lunch time to finish my talk for the PM session and also worked on my talk in the back of the room during the other PM talks. The PM session was on metagenomics and the most pleasing thing was that David Relman did not show up and he was replaced by Peter Turnbaugh from Jeffrey Gordon’s lab. Now – I wam not saying it was good that Relman was not there — he usually gives smashingly good talks. But Turnbaugh, a PhD student, stepped in as pinch hitter and gave a great talk on gut microbiome studies, really setting the stage for the whole session. I do not know if he was nervous stepping into a session like this but it did not show if he was. He certainly seemed relaxed when he said “Thanks to Dr. Relman for getting stuck in Chicago”

Forest Rowher gave a good talk on metagenomics and pointing out that viruses still get ignored in this field relative to their likely importance in communities. I have written about Forest before so I am going to discuss the other talks more … but if you have not heard him talk before try to find a way. He has a VERY different perspective on genomics and metagenomics than most of the people doing it. And he is dead right about the need to do more work on viruses.

Garth Ehrlich gave a talk on “bacterial plurality” and why he thinks gene content variation within communities of microbes in biofilms is important. His data certainly seemed solid and he showed some results that call into question the claims that some aspects of the “pangenome” hypothesis (he showed that the total number of genes in the Steptococcus strain collection does seem to level off after sequencing ~ 30 genomes and thus that the number of genes is not infinite as some people have suggested). So I liked some aspects of his talk. But he did make some evolution statements I found disagreeable (for those who care about the nitty gritty – he showed a cluster diagram of strain similarity and then used the position of strains within the cluster diagram to reflect relative branching order and historical patterns. A cluster diagram is a bad thing to use and one should use a phylogenetic tree for this. In addition he implied that one could make a genome-phylogeny from gene presence/absence information that would be more robust than a standard alignment phylogeny. This is not a reasonable thing
in my opinion — gene presence/absence patterns tend to end up grouping together unrelated lineages that have separately undergone gene loss. I just do not understand why people so badly want to not use alignments to build trees). Anyway – overall many of the things he said were interesting but I find certain non-evolution evolutionary analyses really grating.

Anyway – I was going to ask him a question after his talk about this, but then decided that, since I was talking next, getting into an argument with him just before my talk might seem lame. So I passed on the question. And then I gave my talk on the need to fill in the tree of life in terms of genome sequencing projects. I discussed a project we are just wrapping up that was part of the NSF “Tree of Life” program in which we sequenced genomes of eight bacteria that are from phyla that at the time had no genomes available. And then I talked about a new project I am coordinating at the Joint Genome Institute in which we are sequencing 100 genomes to really fill in some of the bacterial and archaeal tree. Next week I will post more about this project but I note – this is not done to study the tree of life per se. It is being done because if we have reference genomes from across the tree, all of our genome analyses of other systems and of metagenomes get better.

After dinner and some shell cllecting on the beach, there were evening talks and I went to the informatics session. Some of the talks there were good but the best thign I saw there was someone (I think Ben Blackburne) saying his slides were going to be on something called slideshare.net. I had never heard of this and checked it out and it seems pretty cool. I may use it in the future … but gotta go off to other things.

Marco Island Evening One – The Strange and the Good

Well, I made it through my talk at Marco Island without too many scars. It seemed to go pretty well – I talked about a new project in which I am involved at the Joint Genome Institute on creating a Genomic Encyclopedia for Bacteria and Archaea. I will write about that more here at another time.

But what I want to do now is discuss some of the marketing ploys from last night. One of the strangest was from the Pacific Biosciences group which sponsored a beach party with fireworks. It was completely surreal. People lingering at the beach with drinks and loud music and then all of a sudden – fireworks were launched into the sky. Not the “greenest” of activities I must say. But never mind that. What was the reason for fireworks in the middle of February? I guess the company is trying to make a big splash but the whole thing was just strange to me.

Much better was the party sponsored by Genome Technology magazine. It was a few hundred yards down the road at a bar. Everyone had to walk there which was good since many people end up never leaving the halls around the conference area. And the place was packed to the gills with people drinking and eating and seemingly having a good time. No fireworks (thankfully) and a good respite from the hotel.

Needless to say, these types of festivities do not happen at any evolution or ecology conference I have been to. The genomics world is still heavy on the marketing and self promotion. Sometimes that makes it fun (Genome Technology) and sometimes it just makes me want to run away (Pacific Biosciences).

AGBT Marco Usland Update – Long Live Sequencing

Well, I am sitting in the back of the room at the AGBT meeting and just heard Eric Green give the introduction and Joe Ecker is talking right now. And the theme of the meeting is pretty clear:
LONG LIVE SEQUENCING

Basically, the meaning of this is that, though many said sequencing was dead a few years ago, sequencing is alive, thriving, and going a bit crazy. With the new massively parallel high throughput sequencing machines sequencing is being used for everything and anything. For example, Ecker is using sequencing to study methylation of the genome of Arabidopsis. And others are usign sequencing for expression studies. And of course there is population genetics. And genetic mapping. And my favorite – metagenomics. And so on. So, despite the push to move into a “post genomics” world, sequencing is growing in use not shrinking.

What kid would want to study bacterial evolution when they grow up?

OK – I am a bit scared by this, because it shows how little I have changed since I was young. And in my memory, I was not a total science geek for my whole life (you know – I focus on the fact that in high school, I played baseball and hockey and other sports pretty seriously, I guess my memory skips over that I was captain of the math team too).

But I was digging through some old papers and found this … a paper I wrote in ninth grade. We got to select a topic for the paper and mine … “Describe one step in the evolution of a bacterium.” The funny thing is — I do not remember this at all. I mean, I remember reading books by Gould that got me interested in evolution. But surely Gould did not write a lot about bacterial evolution. Where did I come up with this topic? I haven’t a clue. Anyway – here is the essay – errors, fluffy handwriting, and all.

http://picasaweb.google.com/s/c/bin/slideshow.swf

Leslie Orgel – Still Speaking Wisely and Openly Even After Death

OK – my mind has been blown. Leslie Orgel, who just passed away recently, has a new Essay in PLoS Biology called “The Implausibility of Metabolic Cycles on the Prebiotic Earth.” Anyone interested in the origin of life should check this out.

He had me at the beginning … and as usual has very clear discussions of the steps needed for life to have originated:

If complex cycles analogous to metabolic cycles could have operated on the primitive Earth, before the appearance of enzymes or other informational polymers, many of the obstacles to the construction of a plausible scenario for the origin of life would disappear. If, for example, a complex system of nonenzymatic cycles could have made nucleotides available for RNA synthesis, many of the problems of prebiotic chemistry would become irrelevant. Perhaps a simpler polymer preceded RNA as the genetic material—for example, a polymer based on a glycerol-phosphate backbone [5] or a phosphoglyceric acid backbone. Could a nonenzymatic “metabolic cycle” have made such compounds available in sufficient purity to facilitate the appearance of a replicating informational polymer?

The paper then discusses details of various metabolic cycles and why the current evidence is not completely convincing in terms of the exact path that was taken in the origin of life. Note to ID supporters – this does not friggin‘ mean that he is saying life could not have originated from non living systems. He is simply pointing out that our understanding of it is incomplete. As, by the way, is our understanding of how blood works. But that does not stop us from thinking that blood does in fact, well, work.

Anyway, once you get over the fact that some ID supporters will misuse his work, the end is a great call for what needs to be done:

The prebiotic syntheses that have been investigated experimentally almost always lead to the formation of complex mixtures. Proposed polymer replication schemes are unlikely to succeed except with reasonably pure input monomers. No solution of the origin-of-life problem will be possible until the gap between the two kinds of chemistry is closed. Simplification of product mixtures through the self-organization of organic reaction sequences, whether cyclic or not, would help enormously, as would the discovery of very simple replicating polymers. However, solutions offered by supporters of geneticist or metabolist scenarios that are dependent on “if pigs could fly” hypothetical chemistry are unlikely to help

Yes, that is right, he got “if pigs could fly” into a paper. He was a great scientist. And it is nice for me to see one more paper of his. And this one, unlike pigs, can fly forever, because it is truly OA.

Syphilis origin solved?

A new paper in PLoS Neglected Tropical Diseases attempts to tackle a rather sensitive topic – the evolutionary origin of treponematoses (i.e., diseases caused by bacteria in the genus Treponema including yaws, pinta, and the one everyone wants to know about – syphilis). The paper, by Kristin Harper and others uses evolutionary reconstructions to try and determine if good old Christopher Columbus played a role in bringing syphilis to the New World.

There was an extensive article about this study in the New York Times on 1/15/08 (by John Noble Wilford). Overall the Times article is good (except Wilford gets the definition of phylogenetic analysis a bit wrong – saying it is the study of the evolutionary relationships between organisms when really it is the study of evolutionary relationships of anything… but hey that is OK).

I confess I am not sure if I am completely convinced by all of Harper et al’s arguments concerning the evolution of these bugs. My main concern is that the amount of variation they observe (in ~ 20 genes across these strains) is very very low. And thus the resolution of the phylogenetic trees is quite poor.

Because this is a PLoS paper, it is truly “Open Access” and I can include the Figure here in my blog as long as I cite the original source (see below). If you look at the tree you can see some #s on the branches in the tree. These are based on a statistical test called bootstrapping and the numbers indicate (roughly) how well the tree that is shown represents all of the polymorphisms in the data. The #s are percentages and alas the % support is not very high for many of the branches. So a better resolution of the question of the origin of these diseases will likely require, well, a better resolution on the tree. This in turn will likely require complete genome sequences and perhaps more strains samples. Nevertheless, given the results they have, their arguments seem sound … and this should stimulate people to gather more genomic data from these bugs.

Also see some other blogs on this

  • Neil Woodburn
  • John Dennehy (who mentions an article by Carl Zimmer in a non OA publication … find out which by going to his blog)

Here is Figure 3. It is from Harper KN, Ocampo PS, Steiner BM, George RW, Silverman MS, et al. (2008) On the Origin of the Treponematoses: A Phylogenetic Approach. PLoS Negl Trop Dis 2(1): e148. doi:10.1371/journal.pntd.0000148

Figure 3. This maximum likelihood tree is based on 20 polymorphic regions in the T. pallidum genome. Bootstrap support was estimated with 1,000 replicates in order to assess confidence at branching points and are shown within circles where values are high (>90%). Bootstrap support values for both maximum likelihood and maximum parsimony trees are shown, in that order.