Interesting new #OpenAccess PNAS paper from C. Titus Brown: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Of course, if you follow Titus on Twitter or his blog you would know about this already because not only has he posted about it but he posted a preprint of the paper on arXiv in December.
Check out the press release from Michigan State. Some good lines there like “Analyzing DNA data using traditional computing methods is like trying to eat a large pizza in a single bite.”
A key point in the paper: “The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory.” This is important because right now most assemblers for genome data use a ton of memory.
Anyway the software behind the paper is available on GitHub here. Assemble away.