Undergraduate Genome Project – Jonathan Eisen's Lab

Update on Curtobacterium and Other Musings

In my first year in the Eisen lab, I was lucky to be able to participate on the Undergraduate Genome Sequencing Project in which I published the draft genome of Curtobacterium flaccumfaciens, the first of it’s genus. An important aspect of this project was blogging about what we were doing: All the successes, the failures, and everything in between, something that I was terrible at evidenced by my one maybe two blog posts. However, the longer I have been in this lab, I find the significance of social media in science, both to myself and the world, grows.

After almost a year since the paper was published, the Eisen lab received an email inquiring about my blog post on Curtobacterium and the difficulties we had with getting enough active DNA and continuing with sequencing. They wanted to know if we were having trouble with DNA extractions on the bacteria, especially since they were interested in sequencing other species of Curtobacterium and were worried if the genus was finicky. We had later found that the viability of our ligase decreased with each successive freeze-thaw causing the huge issue in DNA library prep and were able to inform them that extracting DNA and sequencing Curtobacterium should be a relatively painless process.

There were two things that stuck me as interesting when David, my supervisor on the project, informed me about the email exchange. First, that it was awesome that a blog post that I, an insignificant undergraduate, wrote was seen by other researchers and contained information (as small as it was) that could help them in their research. Second, and more abstract, that science has increasingly become more of a collaborative effort. When I originally thought about sharing in science, the infamous Koch-Pasteur rivalry quickly came to mind. Information simply wasn’t shared as readily at that time. I like to think idealistically that the idea of hoarding information to get ahead of contemporaries has become less common and science will become even more collaborative than it is now. Or the idea of charging to view more than just the Abstract will cease to exist and the number of open-access articles will continue to grow because at the root of researchers (at least originally) is the pursuit of knowledge and dissemination of information. Just some musings I had and who am I to talk? I haven’t even graduated undergrad yet and haven’t joined the race to find the richly rewarding cure to cancer.

Summary of Undergraduate Genome Sequencing Project

(cross posted from microbe.net)

With the publication of the 6th and last genome paper to come out of our Undergraduate Genome Sequencing Project I thought this would be a good time to reflect on how it all went.

To summarize, we had a group of undergraduate students go out into the built environment and attempt to find microbes whose genomes had not been sequenced. They then sequenced and assembled the genomes, followed by authoring a short Genome Announcement publication per genome. The goal was two-fold, first to give the students a real research experience that encompassed both lab work and bioinformatics. The second goal was to increase the number of reference genomes from the built environment.

It turned out to take a lot longer than we thought, and involved some dead ends along the way. However, the project was ultimately a success and the students appreciated being part of a real research project. I’ve since had several folks ask for details on the project, in order to do the same thing at their institutions. What we’ve decided to do is create a detailed step-by-step protocol for starting with a swab in hand and finishing with a Genome Announcement publication describing the genome assembly. In order to achieve this goal, we have a student, Madison Duntiz, here at UC Davis who is going to repeat the process from start to finish using some microbes left over from our Project MERCCURI collections. Along the way she will document everything in detail and we will publish the results here on microBEnet for anyone to use.

While waiting for that to finish up, I thought I would at least post the outline of the steps that we would recommend for a similar project. Obviously this is lacking a lot of detail, but I’d be happy to answer any questions while we work on the detailed version.

Basic outline of the protocol

-Collect microbes from your favorite built environment using sterile swabs

-Swab onto solid media plate, and grow the swabs in liquid to be plated out as well (note that the temperature of incubation and the type of media used will strongly influence the kinds of bugs you find)

-Dilution streak colonies of interest. Dilution streak again (having a mixed culture is bad news)

-Grow colonies up as overnight cultures

-Perform colony PCR using 16S primers directly on the bugs from the overnight cultures. The resulting PCR fragments get cleaned and then sent for Sanger sequencing either at a University or an outside company.

-Trim and align the resulting reads, and BLAST the consensus sequences to identify the organisms. In most cases you’ll probably also have to made a phylogenetic tree of the results in order to accurately identify the bugs. Choose a bug whose genome has not been already sequenced.

-Take that overnight culture and extract genomic DNA. Where you go from here depends on your resources and budget. Some people might give this DNA directly to a sequencing center, others (such as ourselves) might choose to do make sequencing libraries themselves.

-Create sequencing libraries, preferably using a kit although there are other options.

-Confirm the quality of the sequencing libraries and normalize between libraries using qPCR. Submit the barcoded libraries for Illumina sequencing.

-Demuliplex the resulting reads and mentally prepare yourself for genome assembly.

-The process of trimming, error-correcting, assembling, scaffolding, and verifying the assembly is a whole field unto itself. However, to avoid this morass we used the super awesome A5 Assembly pipeline which does all of those steps for you and creates really high-quality assemblies to boot (full disclosure, this was developed in our lab… but is free, open-source, and easy to install and use).

-Submit the completed assembly to RAST for gene annotation.

-Submit the assembly to the NCBI, submit the reads to either SRA or someplace like Figshare.

-Take the information about the bug, the data from the assembly, the data from RAST and put together a Genome Announcements publication. Don’t forget you can’t submit the publication until you have an Accession # from NCBI.

-Submit the paper

-Once the paper is accepted, share your results with the world. Blog about it and enter the genome into the GOLD database.

Curtobacterium flaccumfaciens paper out (Jennifer Flanagan)

The second-to-last undergraduate genome paper is out. Curtobacterium flaccumfaciens by Jennifer Flanagan.

Two more papers out: Jessica and Amanda

The days of this blog are numbered… There’s only a couple more papers to come out (both accepted) and then this project will be officially completed! I’ll write a summary and reflections on the whole process at that time.

Meanwhile, congrats to Jessica and Amanda whose papers came out this week:

Jessica’s Kocuria paper

Amanda’s Dietzia paper

Zach presenting his genome project work at the Undergraduate Research Conference

Here’s a picture of Zach talking about his search for the Microbacterium genome

Genome papers published: Brachybacterium muris UCD-AY4 and Microbacterium sp UCD-TDU

When we first started this genome sequencing project in Jan 2012 we had hopes of wrapping up the project by Spring and getting the papers out that summer. Turns out it was a bit more complicated than we thought. The first two papers, by Jonathon and Zach, came out today. The other 4 are in various stages of processing (waiting on GenBank).

Congrats guys!

http://genomea.asm.org/content/1/2/e00086-13.full

http://genomea.asm.org/content/1/2/e00120-13.full

Advice on asking for letters of recommendation (updated May 2013)

This is based off an e-mail I sent recently to a student and someone suggested I post it here:

Asking for letters of recommendation

In general I, and others are happy to write letters of recommendation for people… it’s part of our jobs after all. However, there are some tips I would offer anyone soliciting letters at any stage of their career.

1) Don’t ask me for a letter only a few days before it’s due. This seems like such a simple concept but one that is violated so often.

2) If you ask me for a letter, you need to send a copy of your CV. No matter how long I’ve worked with you, there’s probably still information in there I didn’t know and this helps me write a letter that doesn’t sound like a form letter.

3) Send me a description of the program you’re applying for and why. Again, this helps me write a better letter and doesn’t force me to have to trawl the internet for information

4) Make it as easy as possible for me to write the letter! This is especially critical with professors. If the letter needs to be mailed, you should hand me a stamped, already addressed envelope so all I have to do is drop in a letter and throw it in the outgoing mail. If it’s an electronic form provide me with detailed instructions and links.

5) Don’t attempt to bribe me. I’m not kidding… for example once I got a handwritten request for a letter of recommendation along with $50. This is not a good idea!

(Updated with two more in May 2013)

6) Before asking a post-doc or a project scientist for a letter of recommendation make sure that you don’t actually need one from the professor.

7) Don’t list me as a reference for anything without at least asking first.

Genome Project Documentary finally finished!

After collecting dust for some months I’ve finally put together the (4-minute) documentary on the Undergraduate Genome Project. The idea here was film the students throughout the project and create something that would give a basic idea of what they accomplished.

Soon I’ll be posting more about how the project went, but for now check out the video here

TDU (M. oxydans) info, and what lies ahead

The ominous clouds in the sky and the cool breeze in the air can only mean one thing – Summer is gone and it’s time to stop slacking off. With that in mind, I thought I would update everyone on how my work on Microbacterium oxydans has progressed. Now that the bioinformatics work is more or less done, I’ve been putting all of the information together and trying to make it sound somewhat coherent. I am currently halfway to the 500 word limit that the Journal of Bacteriology has imposed for genome announcements, so the end is definitely in sight but more work still needs to be done. It’s going to be interesting managing classes, this paper, and another project I’m working on (more on that later) but I’m optimistic that the paper will be completed within the next week or two.

M. oxydans infothat I’ve written about so far:

Microbacterium have been found in many human associated environments (blood, radioactive sites, food) and also act as plant pathogens in some cases. TDU was found in a toilet (thanks David!)
The a5 pipeline produced 1,131,749 paired end reads with an average length of 133 base pairs, giving 80.358 fold coverage after error checking and quality control were performed. The N50 value of the genome is 1,056,891 bp (this is a weighted median, denoting the contig size in which 50% of all bases are contained in contigs that size or greater). I still need to include scaffold and contig length information, such as median and mean values
The size of the genome is 3,746,321 bp comprised of 44 contigs in 8 scaffolds
RAST was used to automatically predict the genes present in the TDU sequence. The default RAST gene model was used to make these predictions. The program predicted 3,667 coding regions and 357 subsystems. I will probably go more in depth into this area of the paper
A phylogenetic tree of 16s sequences was created to confirm TDU as M. oxydans. I still need to include information about percent identity

Once this is all taken care of, I will be undertaking a new, independent project through the Provost’s Undergraduate Fellowship. My proposed project will deal with milk pasteurization in regards to cheese safety and quality. I will be creating cheeses using pasteurized cow’s milk and raw, unpasteurized cow’s milk and assessing the microbial content of the final products to see if there is a correlation between pasteurized milk and cheese safety. I will look for and attempt to identify and quantify known cheese pathogens, such as E. coli, Salmonella, and Listeria monocytogenes. The debate between raw milk and pasteurized milk has been ongoing, but with this study I can hopefully provide a little bit of insight into the controversy. Right now I’m writing my proposal (due in 3 weeks, yikes!) and doing hours of research to make sure I am prepared to undergo this project, but it’s still early and I have a lot to learn.

UPDATE (10/15) – So I was able to put some more time into the paper and it looks like I’ve gotten all of the information I need down. I’ll be sending it off to David so he can critique it

Curtobacterium flaccumfaciens (AKU)

Hi, this is a really long overdo post, but I figured it was about time to start blogging. My name is Jennifer and I am a third year Cell Biology and double in Communications. I am slightly new to the lab, jumping onto the Undergraduate Genome Project in late May. When I jumped on, I was posed with the task of rescuing the lost Curtobacterium from a myriad of petri dishes. After being sure we made a glycerol stock and running the 16S sequence through BLAST we identified the microbe to be Curtobacterium flaccumfaciens (which I will henceforth refer to as AKU), a gram positive soil bacteria and known plant pathogen. AKU could easily be identified as a phylogenetically informative microbe, because after checking RAST, we saw that neither Curtobacterium flaccumfaciens nor any other Curtobacterium species had been sequenced. However, everything afterwards proved to be far more difficult. After many failed library preps, particularly in the qPCR step (we weren’t getting enough active DNA) we decided that we would combine a TruSeq library and a Nextera library, hoping that the biases would be checked by each other. The bias mostly arose from the fact that we had to PCR 18 times for the Nextera library to get enough DNA which would definitely bias the reads for a microbe with as high of a GC content as AKU. We were eventually able to sequence both AKU and THP (Dietzia) with the new 250bp read MiSeq. However, we found that there was a large amount of E. Coli contamination an unfortunate side product of Nextera libraries. We were eventually able to throw out most of the E. Coli reads by lowering the stringency A5 uses to determine what is “trash” DNA. We also found that A5 can only accommodate 160bp reads so for now we are using a trimmer that cuts off 90bp so that we are at least able to run the assembly and not have it crash. We are hoping that we can somehow include those 90bp that we chopped off and even better be able to run A5 with 250bp reads. I will update you on more failures and triumphs in the near future!

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: