PacBio Sequence Assembly Workshop

PacBio is hosting an evening symposium next week as part of another workshop I’m organizing on campus. All are encouraged to attend! Plenty of food available afterwards.

PacBio Sequence Assembly Workshop

Tuesday, December 17th 2013, 4 pm – 7 pm

The Auditorium, 1005 GBSF

4:00 pm                     Welcome & Introductions

4:00 – 4:30 pm        Shane Brubaker, Solazymes

“Assembly, haplotyping, and annotation of a high GC algal genome.”

4:30 – 5:00 pm         Jason Chin, PacBio

“String graph assembly for diploid genomes with long reads.”

5:00 – 5:30 pm         Lex Nederbragt, University of Oslo

“Using PacBio reads to improve and validate the assembly of the complex Atlantic cod genome.”

5:30 – 6:00 pm         Lawrence Hon, PacBio

“Larger genome hybrid assembly with PacBio.”

6 pm – 7:00 pm        Reception & Discussions

Light Refreshments Will Be Served in GBSF Lobby

Nice new memory efficient metagenome assembly method from C. Titus Brown –

Interesting new #OpenAccess PNAS paper from C. Titus Brown: Scaling metagenome sequence assembly with probabilistic de Bruijn graphs.  Of course, if you follow Titus on Twitter or his blog you would know about this already because not only has he posted about it but he posted a preprint of the paper on arXiv in December.

Check out the press release from Michigan State.  Some good lines there like “Analyzing DNA data using traditional computing methods is like trying to eat a large pizza in a single bite.”

A key point in the paper: “The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory.” This is important because right now most assemblers for genome data use a ton of memory.

Anyway the software behind the paper is available on GitHub here.  Assemble away.

New publication from the lab: Assemblathon 1: A competitive assessment of de novo short read assembly methods

Aaron Darling from the lab is an author on a new paper just published: Assemblathon 1: A competitive assessment of de novo short read assembly methods.

New paper from my lab (& the Facciotti lab): Mauve Assembly Metrics #Halophiles #Genomics

Just a quick post here. A new paper from my lab has come out in Bioinformatics. The paper is relatively simple. Titled “Mauve Assembly Metrics” it reports work of Aaron Darling and Andrew Tritt (with some minor contributions from me and Marc Facciotti). Aaron wrote the program Mauve when he was a student in Nicole Perna’s lab at Wisconsin: Mauve: multiple alignment of conserved genomic sequence with rearrangements. Over the years he (and others) have continued to develop the program and written a few papers too including for example, the development of progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. This new paper reports basically a system/scripts to measure assembly quality. Here is the abstract:

High throughput DNA sequencing technologies have spurred the development of numerous novel methods for genome assembly. With few exceptions, these algorithms are heuristic and require one or more parameters to be manually set by the user. One approach to parameter tuning involves assembling data from an organism with an available high quality reference genome, and measuring assembly accuracy using some metrics. We developed a system to measure assembly quality under several scoring metrics, and to compare assembly quality across a variety of assemblers, sequence data types, and parameter choices. When used in conjunction with training data such as a high quality reference genome and sequence reads from the same organism, our program can be used to manually identify an optimal sequencing and assembly strategy for de novo sequencing of related organisms.

Check out the paper: Mauve Assembly Metrics. Download the scripts/code http://ngopt.googlecode.com and Mauve and play around and let me know what you think.
Note this paper was supported by a grant from the National Science Foundation (ER 0949453). That grant is focused on comparative genomics (sequencing and analysis) of halophlic archaea. Stay tuned for more on that project as we are writing up a series of papers ….
Some related links: