Tag Archives: Undergraduate Genome Project

No Luck with TEU

Currently and sadly the results from TEU are depressing as in not being able to go forth with the data we have and writing a paper. From what we thought of as TEU turned out to be contaminated with another microbe, we know this by returning to our original glycerol stock sample and acquiring a 16s confirming so. Resulting from this I will be helping Hannah with writing her paper for her organism, Leucobacter.

Microbacterium oxydans Data

I finally found a computer with some internet access (just moved apartments and I have no internet or television for a week) so I decided to update everyone on the status of my organism Microbacterium oydans (TDU).

The library construction went very well, producing a genome with only 44 contigs contained within 8 scaffolds. The total length of the genome was 3,746,321 base pairs, which is similar to the lengths of other Microbacterium genomes (3,982,034 for M. testaceum, 3,952,501 for M. yannicii). I used the RAST genome annotation program to identify genes within my genome . The genome contained 355 subsystems and 3667 coding sequences according to the annotation pipeline.

Since this isn’t my computer I will have to end this post prematurely for now. Once I can get some more sustained internet time I will provide additional information about this organism

Data, Data, Data!

It’s been a while since our last post but we finally (after many technical challenges and more than our share of bad luck) have data and are in the process of analyzing it!

I’ve worked in other labs before this one and I always forget how much fun it is to finally take a step back and analyze the data you’ve been working so hard to obtain. I thought I’d share some of what we’re up to with you.

 Demultiplexing and Assembly

The data generated by the Illumina sequencing takes the form of many thousands of short reads (about 600bp each). The sequencer also performs some preliminary error-checking and clean-up on the reads so the sequence is easier to work with. Since we pooled our samples into one well, our first step was to separate each set of reads by barcode, this is also called demultiplexing the data.

Unfortunately, none of my reads showed up in the demultiplexed data from the sequencer but when we went back and re-ran the demultiplexer on the raw, pre-processed data we found that the THU reads were present but had been thrown out as errors because the reads had the barcode previously assigned to Amanda (whose library was not being sequenced in this particular sequencing run). We concluded that this was most likely due to a mix-up during the library preparation process and later we verified the reads were truly THU using a whole-genome BLAST.

After demultiplexing, we used an assembly pipeline called the A5 pipeline (a piece of software developed in the Eisen lab) to assemble the reads into contigs and then scaffolds. Contigs are small sections of DNA that have been compiled by aligning reads next to each other using overlapping regions as a guide. Scaffolds are even larger aligned sections of DNA that are made up of contigs. (Nature Education has a helpful diagram here: http://www.nature.com/scitable/content/anatomy-of-whole-genome-assembly-20429)

Annotation

Once the draft genome was assembled into scaffolds we submitted the scaffold data to RAST, a genome annotator. Genome annotation software, such as RAST searches submitted sequences of DNA to identify known genes and gene families in the sequence. It also has a tool for comparing genomes to each other. Below is a summary of the RAST annotation of my organism.

I still have a lot of analyzing left to do, but it’s wonderful to finally be at this step!

It’s Library Preparation Time!

Now that we have chosen our candidates we are in the process of preparing libraries for sequencing. I’ve learned a lot about this process in the past few weeks so I thought I’d share some of what I’ve learned.

First, what is a “genomic library” anyway?

“Genomic library” is the term used to describe the prepared genomic DNA that is sent to the Illumina sequencer for sequencing. Library preparation is a critical step because the quality of a library preparation often determines the quality of the sequencing and the ease of assembly.[i]

How does one prepare a genomic library?

Although there are many different methods to choose from in library preparation all methods have the same basic two goals.

  1.  To cut the DNA into small pieces. The size of the pieces depends on the type of sequencing you are trying to do and the purpose of the sequencing. In our case, we want pieces averaging 500 base pairs that are at maximum 800 base pairs. [ii]
  2. To add adapters to each piece.

The differences in library preparation methods are largely differences in the mechanisms by which these two goals are accomplished. For example, the DNA can be chopped enzymatically or mechanically or the adapters can be added by one or a number of enzymatic steps.

Pros and Cons of Library Preparation Methods:

Each step of each preparation method has various advantages and disadvantages associated with it. The primary factors for concern in library preparation are:

  •  Amount of genomic DNA required – in general, the more steps involved in a preparation technique, the more genomic DNA will be required because some DNA will be lost at each step.
  • Cutting bias – certain cutting techniques may be biased depending on the DNA sequence. This generally more of a concern in enzymatic cutting than in mechanical cutting.
  • G-C content – Amplification steps (i.e. PCR in a thermocycler) tend to change the average G-C content of the DNA sample by favorable amplifying sequences based on the amount of guanine and cytosine in them. In general, using fewer amplification steps will decrease this bias. [iii]
  • Price – the preparation methods vary widely in price, this can be a limiting factor.

 Our Methods:

For our libraries we will be using sonication (sound) to chop up the genomic DNA followed by a series of enzyme treatments from the Illumina library preparation kit that will first prepare the DNA pieces for annealing the adapters and then carry our the annealing process itself.

The adapters we are using will each contain a “barcode,” a short sequence of bases unique to each sample. Barcoding allows us to pool our samples and run them on a single Illumina well bringing down the cost of sequencing significantly.

Once we have the sequences back, we will begin the computationally challenging process of assembling and annotating them.


[i] Monya Baker, “De novo genome assembly: what every biologist should know,” Nature Methods 9.4 (2012): 333-337. http://www.nature.com/nmeth/journal/v9/n4/full/nmeth.1935.html?WT.ec_id=NMETH-201204

[ii] More information about how the Illumina sequencing reaction works can be found here: http://seqanswers.com/forums/showthread.php?t=21

and here : http://www.brown.edu/Research/CGP/core/illumina/overview

[iii] Adey, Andrew; Morrison, Hilary; Asan, Xu Xun  “Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition.” http://genomebiology.com/2010/11/12/R119

Some information on the Microbracterium genus

Our genomic library preparation kit finally came in (about a week late) but neither of the mentors are around to explain how to use it, so I thought I’d do some research and see what I could find on the Microbacterium genus. Unfortunately, most Google results are for the Mycobacterium genus which, while a very fascinating genus I’m sure, does not help me at all.

I finally came across a study performed in the mid 1960’s which analyzed some characteristics of the Microbacterium genus. The term Microbacterium was proposed in the early 20th century to identify a particular group of very small, gram-positive, rod-shaped bacteria that have been found in many dairy products. An important characteristic of Microbacterium species is that they are unusually heat resistant. The researchers conducted a number of tests to learn more about the physiological characteristics of the genus, based on 25 unique strains.

Particularly of interest to me is the following:

  • These bacteria grow aerobically, but some strains can grow under anaerobic conditions.  However, they divide much more slowly and lack pigmentation when grown anaerobically. Each strain in the study grew at a decent rate (typically around 3 days of incubation, but as much as 7 days were needed in some cases) when incubated in the 30-37 degree C range. Only 5 strains grew at 39 degrees and only 3 strains grew at 9 degrees.
  • All strains grew in mediums with a pH of 6.8 and 7.5 . More acidic conditions (pH < 6) yielded almost no growth from the strains.
  • Some members of the genus have the ability to reduce nitrates and liquefy casein, while nearly every strain could hydrolyze gelatin
  • 5 strains were able to withstand 85 degree heat, and every strain was able to withstand at least 60 degree heat
  • There is doubt as t0 where this genus should be placed in a phylogenetic tree. At the time of publication of this study, the most recent proposal was to place Microbacterium within the Corynebacterium. The author, however, dismissed this based on what he observed in terms of heat resistance and ability to be cultured, noting large differences between the two.

Here is a link to the paper

Kocuria Rosea, I choose you!

In a previous post, I noted that I was currently working on identifying some environmental isolates I took from various locations.  When I got my 16S sequences back, I had the ever so popular MicrococcusStaphylococcus, and Bacillus bacteria in most of my samples.  Two that stood out were Kocuria Kristinae (OTW) and Kocuria Rosea (OTCP).  Of the Kocuria species, only one has been completed and published, Kocuria Rhizophilia.  One other is in permanent draft, and another three are targeted.  By using BLAST on my two samples 16S gene, I determined that both are 96% related at the 16S gene level to the published species.  Because of this difference, both samples are fairly distantly related to Kocuria Rhizophilia.

Now came the hard question of which one to use for library construction.  Both are very closely related and found in the same general type of environment.  I checked my genomic DNA concentration of the two samples and it turned out that I did not have much genomic DNA of Kocuria Kristinae and had plenty genomic DNA (about 785 ng/µL) of Kocuria Rosea.  Thus, it was more practical for me to use Kocuria Rosea for my library construction project.

To give you a little information on Kocuria Rosea, it is a type of soil bacterium that has been found in various locations such as a polluted soil, indoor environments, deep sea sediments, and a spacecraft.  But mostly, it is isolated from soil and water.  A paper online has also attributed it to a catheter-related bacterium.

Candidate for Sequencing- THP

It is very exciting that I have finally found an organism worth sequencing. After submitting 9 different samples to be sequenced I have obtained a potential candidate to be fully sequenced and that organism is named THP. THP stands for toilet handle pink colony. The original sample for THP was taken from the toilet handle in my apartment bathroom. After running the sequence through BLAST I discovered that the sequence is to an unknown species in the genus Dietzia.

After finding this information, I then looked for how many completed, incomplete and targeted projects there were in GOLD. Here I discovered that there is one completed project for Dietzia alimentaria 72 and an incomplete project for a different species Dietzia cinnamea. Although there are already two different species under the same genus, that have already been sequenced or will be sequenced, I think that THP will be a good candidate to sequence because it can potentially be a new species. If THP is a new species we can use the sequence to compare and contrast with both alimentaria and cinnamea.  Along with doing this, this unknown species can potentially tell us more about organisms that live on things we use on a daily basis.

The next thing David and I did was track down the sequences of both Dietza alimentaria and cinnamea and match those sequences against the THP sequence. After further analyzing the sequences of both Dietzia alimentaria and cinnamea we discovered that alimentaria 72 has 98.1% identity to ours, 67% GC and cinnamea 97.6% identity to ours, 70.9% GC.  These two species are about 97 % identical to each other. Dietzia alimentaria is from traditional fermented Korean food and Dietzia cinnamea is found in petroleum contaminated soil in Brazil. Some interesting information about Dietzia cinnamea is that it is able to degrade petroleum hydrocarbons. I found it to be really interesting that these two species are in the same genus because they are found in very different environments.

As of right now, the genomic prep I had originally made of THP did not have enough genomic DNA to begin to constructing a genomic library. Therefore I am in the process of making a new genomic prep, hopefully with an abundant amount of DNA.

Candidate Sequencing Organism – TDU (Microbacterium oxydans)

Micrococcus luteus wasn’t interesting enough to warrant further analysis, so I have picked another organism, TDU, to begin constructing a genomic library of.  It appears to be within the Microbacterium genus, and shows identical similarity with the species oxydans.  We actually isolated several Microbacterium colonies throughout this project from different sources, so I had a number of samples of which I could choose from to begin moving forward with.

A phylogeny of the different Microbacterium samples we isolated was built by David Coil, to help me visualize how similar the samples are to the published genome. There is one completed and published genome in the Microbacterium genus, for the species testaceum, so the goal of the tree was to help me pick from the most divergent organisms to minimize the chances of a duplicate publication of the same organism’s genome. This tree shows the comparative similarities of the Microbacterium species we found with the published Microbacterium testaceum genome recovered, with the most divergent organisms appearing to the left. UPDATE: Two outgroups have been added to further illustrate the degree of divergence

M. testaceum 16S sequence_alignment_tree

The three samples I picked as the best candidates were AV2, TDU, and TFU (TJU was a difficult and sloppy process to isolate, so I played it safe and avoided it altogether). AV2 had very, very low concentrations of DNA in the genomic preparation, so the sample was discarded.  TDU and TFU both contained high levels of genomic DNA in their genomic preparations, so both were still equally viable as candidates. When I checked the glycerol stocks of both organisms on plates however, TFU appeared to have slight contamination (which is really bad, considering these stocks are our last resource for obtaining pure samples of these organisms). This confirmed TDU as the Microbacterium oxydans sample that I will begin working with to construct a  library of.

Currently, the dilution streak of TDU is incubating at 37 degrees C, and tomorrow I will begin the process of confirming the glycerol stock and begin the tagmentation reactions for the genomic library.

Candidate in the Spotlight: THU

A couple posts ago I introduced you to many of our potential candidates for sequencing. As it turns out, the organism that I will be sequencing is THU (more about the name here), an organism that was already in the pilot project stage and one that is a very strong candidate for sequencing.

 What do we look for in a candidate organism?

Since the goal of this project is to add useful genomes to the reference genome database, it is important that we choose organisms that have not been sequenced before and are relatively abundant without being too common. Genomes of organisms that are very common can be reconstructed from metagenomic samples so it is less important that we sequence their genomes. On the other hand, organisms that are very rare may not be characteristic of the built environment ecosystem at all.  Although rare organisms of the built environment will be interesting to sequence in the future, they are not our priority right now.

The case of THU

THU is from the Leucobacter clade and is a close relative of a strain known as Leucobacter chironomi MM2LB.

BLAST-generated 16s tree of THU (THU highlighted in yellow)

Leucobacter is a group within the microbacteriacae family and is characterized by the presence of 2,4-diaminobutyric acid in peptidoglycan. Leucobacter have been found in many different environments including soil, chromium-contaminated wastewater, nematodes guts, potato leaves and eggs of a midge known as Chironomidae. Some of the clade are chromium-resistant and have been found in chromium-rich environments.1

The organisms THU is most closely related to have been associated with a variety of human-related activities and built environments in the past (i.e. activated sludge and industrial wastewater). Many other organisms from Leucobacter have also been found in built environments (fuel tanks, duck barns and biogas systems)2.  However, although present in these environments, Leucobacter is generally not overly prevalent2.

Although there are no completed or permanent draft genomes in the GOLD database, we did find a draft genome for one species of Leucobacter (Leucobacter chromiiresistens), through a GoogleScholar search.3 To our knowledge no other genomes have been published for the group.

Because of Leucobacter’s association with the built environment, its level of abundance in these environments, and the lack of many published genomes, we’ve concluded that THU is a strong candidate organism.

Edit 5/14/12:

A note about naming:

“THU” is a three letter code that we originally used to designate our organisms. The first letter designates the environment from which the organisms was isolated (in this case a residential toilet), the second designates the specific organism identifier (organism “number” H from the toilet samples) and the third number designates the number of lab generations (in this case, unknown).

It is important to note that our naming system evolved as we learned what information was most relevant to our work. Many of the later isolates do not follow the same code as THU (we left off the generation number, for example, in many of the later samples because we discovered it was not important to keep track of for our purposes).

References:

1. Halpern, Malka, Shaked, Tamar, Pukall, Rudiger et al. (2009). Leucobacter chironomi sp. nov., a chromate-resistant bacterium isolated from a chironomid egg mass. International Journal of Systematic and Evolutionary Microbiology, 665–670. http://ijsb.sgmjournals.org/content/59/4/665.full#cited-by

2. http://www.sciencedirect.com/science/article/pii/S0048969706001197

3. http://scholar.google.com/scholar?q=leucobacter+genome&hl=en&btnG=Search&as_sdt=1%2C5&as_sdtp=on

Guest post: Paul Orwin at CSUSB on another undergraduate genome sequencing project

(Cross-posted from the microBEnet blog.)

Guest post by Paul Orwin, Associate Professor of Biology, CSU San Bernardino.

CSUSB students

The California State University at San Bernardino is a regional comprehensive university in the Inland Empire region of California (Riverside and San Bernardino Counties). It is one of only two (the other being UC Riverside) public universities serving this region. As a master’s level school, we have a diverse student body from a lot of different educational and ethnic backgrounds. Designing courses for this group of students is a challenge! Fortunately, by the time they reach the upper division Biology courses, they have had a thorough grounding in biological sciences and chemistry. This makes my task in putting together this course easier. Many of the students see themselves following a health professions route in the future, including Medical, Dental, and Pharmacy school as well as some interested in Ph.D. studies. So there is a lot of demand for a course in Medical Microbiology, but I wanted to spice things up a bit! I decided that where I could do that was in the laboratory segment of the class, by including an enrichment and isolation experiment along with the traditional clinical microbiology diagnostic experiments.

I first need to explain where the idea for this course came from. For several years I have been taking students from my lab group to the International Conference on Microbial Genomics held at Lake Arrowhead, CA every other year. This is a fantastic meeting, organized by Jeffrey H. Miller (gotta be careful with those middle initials!) at UCLA. In a not terribly surprising coincidence, Jonathan Eisen (who got me interested in microBEnet) and Ashlee Earl (who I don’t think has a web site) (who will appear later in this story) are involved in organizing it this year. At this meeting I learned a great deal about genomics and metagenomics, and got interested in the idea of incorporating this type of work into the classroom based on the work Jeffrey Miller and Erin Sanders were doing with UCLA Microbiology undergraduates. One year they reported on their efforts to sequence and annotate the genome of a novel microorganism and another time Erin’s class put up posters describing the phage they identified, sequenced, and annotated. They wrote a textbook about this work, which goes to show they are dedicated to this idea! As we will see, I have not gotten nearly that far in my own efforts. Another source of inspiration for this work was the class that Jared Leadbetter taught at CalTech when I was working with him, on enrichment and isolation strategies from the environment (including the Built Environment, incidentally). The inventiveness of these students was remarkable, as was the frequency with which they were successful. Of course, he has forgotten more microbiology than I will ever know, which probably helps. After I started my own faculty journey, I drew on this inspiration as well as many conversations in various forms with Mark Martin (a true microbial supremacist) to develop an enrichment and isolation approach for my general Microbiology course. Mark and Jared (and others) have inspired me to think about culture techniques, and about the claim that much of the microbiome is “unculturable” (preposterous, IMHO).

Human microbiome

The final person who got me interested in this is the aforementioned Dr. Ashlee Earl, who presented some work on the Human Microbiome Project at the last ASM general meeting. On her poster, she described how the HMP had identified a group of 100 most wanted organisms – organisms that they wanted other labs and research groups to isolate so that a good set of reference genomes could be developed. This served as the jumping off point for my course design (if you can call it that).

Ok, the name dropping is out of the way (or the giving credit where credit is due, if you prefer), so on to the class itself. It is a class in Medical Microbiology, with the lecture based on Mims’ Medical Microbiology. The lab is based on enrichment and isolation techniques, bringing together classical clinical microbiology tests (metabolic testing, serotyping, and staining) with 16S rRNA sequence analysis for identification purposes. The idea here is to teach the students how to use these techniques for two major things a medical microbiologist might do – identify a known pathogen by rapid testing procedures, or identify and classify an unknown organism associated with a pathology.

The first half of the course (which we have just completed) involved identifying organisms from a mixed culture (given to the students by myself) based on traditional microbiological techniques. This identification was complemented with a 16s rRNA experiment, which also served as an introduction/refresher on basic molecular biology techniques (PCR, gel electrophoresis, DNA extraction). When the DNA sequences are returned to us from the sequencing facility, we will be analyzing them using the RDP database. This will also give us a chance to discuss error in sequencing and PCR, as well as the difference between identifying and classifying. Hopefully they get the same thing from the sequencing as they got from the culture tests!

We have spent a good deal of time discussing the idea of enrichment and isolation, and how this can be applied to the Human Microbiome. They have seen the immense diversity of the microbiome (cite) as well as the difference between what is there and what is published. To prepare them for the task, I used the HMP table that lists off the organisms identified from various body sites and categorizes them as “Most Wanted, Medium Priority, and Low Priority.” I just gave the students the “Most Wanted” organisms to work with, and to make things a little more comfortable for them I eliminated the stool sample organisms. I then proposed several options to them.

1) Everyone could agree on a single target to isolate, and we could design a number of different media to try to enrich for and isolate these bacteria.

2) Everyone could go their own way, picking individual organisms and designing experiments to enrich, isolate, and identify them.

CSUSB students

In the end, several students chose to go their own way, while a number chose to focus on one group (the oral actinomycetes) and come up with multiple different approaches to isolate these bacteria. They all did background research on what is known about culturing these organisms from various sources, and we all agreed on using three of complementary strategies to enrich and isolate these bacteria. The first approach is the traditional enrichment, based on known characteristics of the species, design media that encourage actinomycetes (like potato agar) to grow. The second is to use the desiccation tolerance of the actinomycete spore as a strong selection against other vegetative cells (this is riskier, since there are endospore formers present as well, and we don’t know if the actinomycetes in the oral microbiota sporulate. The final approach is my personal favorite – the oligotrophy approach. First, put them on media with nothing in it (perhaps trace minerals). Let micro colonies form on that plate, then pick the microcolonies onto separate “nothing medium” and let them grow in isolation. Finally carefully pick them onto rich medium (or maybe just a bit richer medium, like 100 mg/L YE) to let them grow big enough to test. If they grow well on rich medium, we can do biochemical tests, or we can just go with the molecular identification. I can’t take credit for this idea (I first heard it from Jared), but I’ve used it a few times and I like it. It helps find bacteria that don’t grow very fast on rich media, or get outcompeted by the boring old familiars on typical clinical microbiology medium.

So that’s where we are right now, with IRB approval in hand, ready to embark on the adventure. I think it will work, and I’m sure that we will all learn something! Sadly I am not enough of a tech geek (yet!) to have the students blogging or tweeting the experience. Maybe next year…