For those interested in so-called “third generation” DNA sequencing systems, this week has had some buzz with the release of a publication in Nature Biotechnology reporting the sequencing and analysis of a human genome using a Helicos Heliscope sequencer. In this paper Stephen Quake and colleagues generated short read sequences from Quake’s DNA using this machine and then analyzed them by comparing them to reference human genomes.
Certainly, what they did was cool. And the use of the Helicos equipment is a good thing for that company and it’s development of single molecule sequencing. And given the “race” if you want to call it that for the $1000 genome, it is thus not surprising that this paper received a lot of coverage from all sorts of angles because they claim it involved the cheapest sequencing of a human genome yet achieved.
So first I want to commend Quake and Helicos for an important step in third generation sequencing. Quake mind you is one guy who is constantly inventing cool new techniques of great use in genomics and biology and he is always worth checking out.
But in this case, there are some aspects of what they claim they achieved here that are very off putting. In particular, I am concerned with the supposed “democratization of sequencing” that they think this project embodies (e.g., see some of the quotes in this). The basis for their concluding that democratization has happened here is that they believe this sequencing (of Quake’s genome) was done at lower cost and with less effort than previous human genome sequencing efforts. To back this up they make a table (Supplemental Table 1) detailing estimates of these values for 8 human genome papers (the original Lander et al and Venter et al ones, as well as Watson’s genome, etc) that is meant to represent some of this information).
In essence Quake et al are doing the following math (my formula, not theirs, but their discussions imply basically this)
D = B/(E*C)
Democratization factor (D) = # of bases sequenced (B) / (amount of effort (E) * cost (C))
That is, with more sequence, less effort, or less cost, the more democratized sequencing is. Sounds fine in some ways. Except when you look at the details.
For example consider the cost (C) of the sequencing. They report that the cost for the sequencing was < $50,000. But this number is misleading since, for example, they do not include any aspect of the cost of actually buying and setting up the machine. For more detail on the flaws in the cost calculation and for more detail on the whole story see Times Online and Dan Macarthur at Genetic Future and GenomeWeb).
However, more disconcerting to me is what they do with the rest of the implied calculation.
For example, they treat all the projects in essence as though they are equal in terms of total number of bases sequenced (B) because I guess after all, all were sequencing human genomes. But this is not fair since the depth of sequencing and the quality of sequencing varies between the projects and more recent projects, such as theirs, make use of the data from prior projects, which allows them to gather less data (e.g., in their paper here they assemble the genome by tiling the reads against reference genomes, thus allowing them to do lower coverage than would be required for denovo assemblies of genomes).
But even worse – the way they calculate effort required (E) is flabbergasting.
They seem to infer this in two ways. First, they make use of the number of runs of the machine that are required. They apparently used four runs while they claim that the use of second generation sequencing methods required many more runs. And many have been questioning this claim (e.g., see Chad Nusbaum’s quotes in the GenomeWeb article).
It is the second way that they infer effort that is perhaps the most annoying. They infer this from number of authors on the papers describing the sequencing of these human genomes (e.g., In Supplemental Table 1 they say “number of authors” is “an estimate of labor.”) And the big thing for Quake et al is that there are only three authors on their paper and dozens to hundreds on other human genome papers. Based on this lower number of authors they conclude that their work required less effort and discuss this as evidence for further democratization of sequencing.
Now suppose we gloss over that there is no way to infer amount of effort by number of authors (e.g., letters to the editor, which usually do not require a lot of effort, can sometimes have hundreds of authors while Origin of Species had but one author and was, shall we say, a lot of work). Even worse to me is that they are trying to compare their paper which is focused almost entirely on the technical aspects of the sequencing with other papers that spend much more effort on studying and discussing what the genomes might mean. For example the Venter/Celera and the public human genome papers are complex detailed volumes with analysis of everything you could think of. To compare the effort required to do this with the effort required to do what they did in the Quake paper which was pretty much assembly and analysis of SNPs is inappropriate and actually offensive.
Given the number of areas that they have oversold how their project has reduced effort and cost for sequencing a human genome and how this implies democratization, I am giving Quake and Helicos my coveted “Overselling genomics award“. Again, not that what they did was not cool or interesting, but by overselling it, it detracts from everything they achieved.