It’s been a while since our last post but we finally (after many technical challenges and more than our share of bad luck) have data and are in the process of analyzing it!
I’ve worked in other labs before this one and I always forget how much fun it is to finally take a step back and analyze the data you’ve been working so hard to obtain. I thought I’d share some of what we’re up to with you.
Demultiplexing and Assembly
The data generated by the Illumina sequencing takes the form of many thousands of short reads (about 600bp each). The sequencer also performs some preliminary error-checking and clean-up on the reads so the sequence is easier to work with. Since we pooled our samples into one well, our first step was to separate each set of reads by barcode, this is also called demultiplexing the data.
Unfortunately, none of my reads showed up in the demultiplexed data from the sequencer but when we went back and re-ran the demultiplexer on the raw, pre-processed data we found that the THU reads were present but had been thrown out as errors because the reads had the barcode previously assigned to Amanda (whose library was not being sequenced in this particular sequencing run). We concluded that this was most likely due to a mix-up during the library preparation process and later we verified the reads were truly THU using a whole-genome BLAST.
After demultiplexing, we used an assembly pipeline called the A5 pipeline (a piece of software developed in the Eisen lab) to assemble the reads into contigs and then scaffolds. Contigs are small sections of DNA that have been compiled by aligning reads next to each other using overlapping regions as a guide. Scaffolds are even larger aligned sections of DNA that are made up of contigs. (Nature Education has a helpful diagram here: http://www.nature.com/scitable/content/anatomy-of-whole-genome-assembly-20429)
Once the draft genome was assembled into scaffolds we submitted the scaffold data to RAST, a genome annotator. Genome annotation software, such as RAST searches submitted sequences of DNA to identify known genes and gene families in the sequence. It also has a tool for comparing genomes to each other. Below is a summary of the RAST annotation of my organism.
I still have a lot of analyzing left to do, but it’s wonderful to finally be at this step!