PacBio is hosting an evening symposium next week as part of another workshop I’m organizing on campus. All are encouraged to attend! Plenty of food available afterwards.
PacBio Sequence Assembly Workshop
Tuesday, December 17th 2013, 4 pm – 7 pm
The Auditorium, 1005 GBSF
4:00 pm Welcome & Introductions
4:00 – 4:30 pm Shane Brubaker, Solazymes
“Assembly, haplotyping, and annotation of a high GC algal genome.”
4:30 – 5:00 pm Jason Chin, PacBio
“String graph assembly for diploid genomes with long reads.”
5:00 – 5:30 pm Lex Nederbragt, University of Oslo
“Using PacBio reads to improve and validate the assembly of the complex Atlantic cod genome.”
5:30 – 6:00 pm Lawrence Hon, PacBio
“Larger genome hybrid assembly with PacBio.”
6 pm – 7:00 pm Reception & Discussions
Light Refreshments Will Be Served in GBSF Lobby
Interesting new #OpenAccess PNAS paper from C. Titus Brown: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Of course, if you follow Titus on Twitter or his blog you would know about this already because not only has he posted about it but he posted a preprint of the paper on arXiv in December.
Check out the press release from Michigan State. Some good lines there like “Analyzing DNA data using traditional computing methods is like trying to eat a large pizza in a single bite.”
A key point in the paper: “The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory.” This is important because right now most assemblers for genome data use a ton of memory.
Anyway the software behind the paper is available on GitHub here. Assemble away.
High throughput DNA sequencing technologies have spurred the development of numerous novel methods for genome assembly. With few exceptions, these algorithms are heuristic and require one or more parameters to be manually set by the user. One approach to parameter tuning involves assembling data from an organism with an available high quality reference genome, and measuring assembly accuracy using some metrics. We developed a system to measure assembly quality under several scoring metrics, and to compare assembly quality across a variety of assemblers, sequence data types, and parameter choices. When used in conjunction with training data such as a high quality reference genome and sequence reads from the same organism, our program can be used to manually identify an optimal sequencing and assembly strategy for de novo sequencing of related organisms.
Note this paper was supported by a grant from the National Science Foundation (ER 0949453). That grant is focused on comparative genomics (sequencing and analysis) of halophlic archaea. Stay tuned for more on that project as we are writing up a series of papers ….
Some related links: