Summary of Undergraduate Genome Sequencing Project

(cross posted from microbe.net)

With the publication of the 6th and last genome paper to come out of our Undergraduate Genome Sequencing Project I thought this would be a good time to reflect on how it all went.

To summarize, we had a group of undergraduate students go out into the built environment and attempt to find microbes whose genomes had not been sequenced. They then sequenced and assembled the genomes, followed by authoring a short Genome Announcement publication per genome. The goal was two-fold, first to give the students a real research experience that encompassed both lab work and bioinformatics. The second goal was to increase the number of reference genomes from the built environment.

It turned out to take a lot longer than we thought, and involved some dead ends along the way. However, the project was ultimately a success and the students appreciated being part of a real research project. I’ve since had several folks ask for details on the project, in order to do the same thing at their institutions. What we’ve decided to do is create a detailed step-by-step protocol for starting with a swab in hand and finishing with a Genome Announcement publication describing the genome assembly. In order to achieve this goal, we have a student, Madison Duntiz, here at UC Davis who is going to repeat the process from start to finish using some microbes left over from our Project MERCCURI collections. Along the way she will document everything in detail and we will publish the results here on microBEnet for anyone to use.

While waiting for that to finish up, I thought I would at least post the outline of the steps that we would recommend for a similar project. Obviously this is lacking a lot of detail, but I’d be happy to answer any questions while we work on the detailed version.

Basic outline of the protocol

-Collect microbes from your favorite built environment using sterile swabs

-Swab onto solid media plate, and grow the swabs in liquid to be plated out as well (note that the temperature of incubation and the type of media used will strongly influence the kinds of bugs you find)

-Dilution streak colonies of interest. Dilution streak again (having a mixed culture is bad news)

-Grow colonies up as overnight cultures

-Perform colony PCR using 16S primers directly on the bugs from the overnight cultures. The resulting PCR fragments get cleaned and then sent for Sanger sequencing either at a University or an outside company.

-Trim and align the resulting reads, and BLAST the consensus sequences to identify the organisms. In most cases you’ll probably also have to made a phylogenetic tree of the results in order to accurately identify the bugs. Choose a bug whose genome has not been already sequenced.

-Take that overnight culture and extract genomic DNA. Where you go from here depends on your resources and budget. Some people might give this DNA directly to a sequencing center, others (such as ourselves) might choose to do make sequencing libraries themselves.

-Create sequencing libraries, preferably using a kit although there are other options.

-Confirm the quality of the sequencing libraries and normalize between libraries using qPCR. Submit the barcoded libraries for Illumina sequencing.

-Demuliplex the resulting reads and mentally prepare yourself for genome assembly.

-The process of trimming, error-correcting, assembling, scaffolding, and verifying the assembly is a whole field unto itself. However, to avoid this morass we used the super awesome A5 Assembly pipeline which does all of those steps for you and creates really high-quality assemblies to boot (full disclosure, this was developed in our lab… but is free, open-source, and easy to install and use).

-Submit the completed assembly to RAST for gene annotation.

-Submit the assembly to the NCBI, submit the reads to either SRA or someplace like Figshare.

-Take the information about the bug, the data from the assembly, the data from RAST and put together a Genome Announcements publication. Don’t forget you can’t submit the publication until you have an Accession # from NCBI.

-Submit the paper

-Once the paper is accepted, share your results with the world. Blog about it and enter the genome into the GOLD database.