Data, Data, Data!

It’s been a while since our last post but we finally (after many technical challenges and more than our share of bad luck) have data and are in the process of analyzing it!

I’ve worked in other labs before this one and I always forget how much fun it is to finally take a step back and analyze the data you’ve been working so hard to obtain. I thought I’d share some of what we’re up to with you.

 Demultiplexing and Assembly

The data generated by the Illumina sequencing takes the form of many thousands of short reads (about 600bp each). The sequencer also performs some preliminary error-checking and clean-up on the reads so the sequence is easier to work with. Since we pooled our samples into one well, our first step was to separate each set of reads by barcode, this is also called demultiplexing the data.

Unfortunately, none of my reads showed up in the demultiplexed data from the sequencer but when we went back and re-ran the demultiplexer on the raw, pre-processed data we found that the THU reads were present but had been thrown out as errors because the reads had the barcode previously assigned to Amanda (whose library was not being sequenced in this particular sequencing run). We concluded that this was most likely due to a mix-up during the library preparation process and later we verified the reads were truly THU using a whole-genome BLAST.

After demultiplexing, we used an assembly pipeline called the A5 pipeline (a piece of software developed in the Eisen lab) to assemble the reads into contigs and then scaffolds. Contigs are small sections of DNA that have been compiled by aligning reads next to each other using overlapping regions as a guide. Scaffolds are even larger aligned sections of DNA that are made up of contigs. (Nature Education has a helpful diagram here:


Once the draft genome was assembled into scaffolds we submitted the scaffold data to RAST, a genome annotator. Genome annotation software, such as RAST searches submitted sequences of DNA to identify known genes and gene families in the sequence. It also has a tool for comparing genomes to each other. Below is a summary of the RAST annotation of my organism.

I still have a lot of analyzing left to do, but it’s wonderful to finally be at this step!

It’s Library Preparation Time!

Now that we have chosen our candidates we are in the process of preparing libraries for sequencing. I’ve learned a lot about this process in the past few weeks so I thought I’d share some of what I’ve learned.

First, what is a “genomic library” anyway?

“Genomic library” is the term used to describe the prepared genomic DNA that is sent to the Illumina sequencer for sequencing. Library preparation is a critical step because the quality of a library preparation often determines the quality of the sequencing and the ease of assembly.[i]

How does one prepare a genomic library?

Although there are many different methods to choose from in library preparation all methods have the same basic two goals.

  1.  To cut the DNA into small pieces. The size of the pieces depends on the type of sequencing you are trying to do and the purpose of the sequencing. In our case, we want pieces averaging 500 base pairs that are at maximum 800 base pairs. [ii]
  2. To add adapters to each piece.

The differences in library preparation methods are largely differences in the mechanisms by which these two goals are accomplished. For example, the DNA can be chopped enzymatically or mechanically or the adapters can be added by one or a number of enzymatic steps.

Pros and Cons of Library Preparation Methods:

Each step of each preparation method has various advantages and disadvantages associated with it. The primary factors for concern in library preparation are:

  •  Amount of genomic DNA required – in general, the more steps involved in a preparation technique, the more genomic DNA will be required because some DNA will be lost at each step.
  • Cutting bias – certain cutting techniques may be biased depending on the DNA sequence. This generally more of a concern in enzymatic cutting than in mechanical cutting.
  • G-C content – Amplification steps (i.e. PCR in a thermocycler) tend to change the average G-C content of the DNA sample by favorable amplifying sequences based on the amount of guanine and cytosine in them. In general, using fewer amplification steps will decrease this bias. [iii]
  • Price – the preparation methods vary widely in price, this can be a limiting factor.

 Our Methods:

For our libraries we will be using sonication (sound) to chop up the genomic DNA followed by a series of enzyme treatments from the Illumina library preparation kit that will first prepare the DNA pieces for annealing the adapters and then carry our the annealing process itself.

The adapters we are using will each contain a “barcode,” a short sequence of bases unique to each sample. Barcoding allows us to pool our samples and run them on a single Illumina well bringing down the cost of sequencing significantly.

Once we have the sequences back, we will begin the computationally challenging process of assembling and annotating them.

[i] Monya Baker, “De novo genome assembly: what every biologist should know,” Nature Methods 9.4 (2012): 333-337.

[ii] More information about how the Illumina sequencing reaction works can be found here:

and here :

[iii] Adey, Andrew; Morrison, Hilary; Asan, Xu Xun  “Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition.”

Candidate in the Spotlight: THU

A couple posts ago I introduced you to many of our potential candidates for sequencing. As it turns out, the organism that I will be sequencing is THU (more about the name here), an organism that was already in the pilot project stage and one that is a very strong candidate for sequencing.

 What do we look for in a candidate organism?

Since the goal of this project is to add useful genomes to the reference genome database, it is important that we choose organisms that have not been sequenced before and are relatively abundant without being too common. Genomes of organisms that are very common can be reconstructed from metagenomic samples so it is less important that we sequence their genomes. On the other hand, organisms that are very rare may not be characteristic of the built environment ecosystem at all.  Although rare organisms of the built environment will be interesting to sequence in the future, they are not our priority right now.

The case of THU

THU is from the Leucobacter clade and is a close relative of a strain known as Leucobacter chironomi MM2LB.

BLAST-generated 16s tree of THU (THU highlighted in yellow)

Leucobacter is a group within the microbacteriacae family and is characterized by the presence of 2,4-diaminobutyric acid in peptidoglycan. Leucobacter have been found in many different environments including soil, chromium-contaminated wastewater, nematodes guts, potato leaves and eggs of a midge known as Chironomidae. Some of the clade are chromium-resistant and have been found in chromium-rich environments.1

The organisms THU is most closely related to have been associated with a variety of human-related activities and built environments in the past (i.e. activated sludge and industrial wastewater). Many other organisms from Leucobacter have also been found in built environments (fuel tanks, duck barns and biogas systems)2.  However, although present in these environments, Leucobacter is generally not overly prevalent2.

Although there are no completed or permanent draft genomes in the GOLD database, we did find a draft genome for one species of Leucobacter (Leucobacter chromiiresistens), through a GoogleScholar search.3 To our knowledge no other genomes have been published for the group.

Because of Leucobacter’s association with the built environment, its level of abundance in these environments, and the lack of many published genomes, we’ve concluded that THU is a strong candidate organism.

Edit 5/14/12:

A note about naming:

“THU” is a three letter code that we originally used to designate our organisms. The first letter designates the environment from which the organisms was isolated (in this case a residential toilet), the second designates the specific organism identifier (organism “number” H from the toilet samples) and the third number designates the number of lab generations (in this case, unknown).

It is important to note that our naming system evolved as we learned what information was most relevant to our work. Many of the later isolates do not follow the same code as THU (we left off the generation number, for example, in many of the later samples because we discovered it was not important to keep track of for our purposes).


1. Halpern, Malka, Shaked, Tamar, Pukall, Rudiger et al. (2009). Leucobacter chironomi sp. nov., a chromate-resistant bacterium isolated from a chironomid egg mass. International Journal of Systematic and Evolutionary Microbiology, 665–670.



Color Changes in TTU3

I’d thought I’d take a minute to talk about one of the first microbes we isolated that has proven to be quite interesting. The microbacterium TTU3 which was isolated from a toilet biofilm first caught our attention as a brilliantly red colony in the middle of an otherwise rather dully colored plate. Although we know color is not necessarily an indication of environmental importance or any other quality other than color the color itself TTU (as it was called then) stuck in our minds and we paid attention as it went through the sequencing process.

While investigating the ideal growth temperature of TTU one of the things we noticed was that its color was temperature dependent. In a couple of side experiments, David Coil grew 5mL liquid cultures at 37ºC, room temperature and 4ºC and noticed that the colder the temperature was the brighter the red color and that at warmer temperatures the bacteria were whitish-yellow.

TTU’s color is temperature dependent but that’s not the whole story. This week, while growing overnight cultures to verify our stock culture, I noticed that after a dilution, the room temperature culture had turned from its usual pink color to the white-yellow characteristic of the 37º cultures but only at the edges of the biofilm that had collected at the bottom had changed color, the most dense collection of cells was still pink.

Upon further investigation I noticed the bright red plate (also stored at 4ºC) that we had made the overnight cultures from also had patches changed color in some places where it used to be red. What was most interesting about the white-yellow patches was where they were. On the streaked plate, the white appeared only at the beginning of the streak not in the single colonies at the end of the streak unless the colony had been picked for an overnight. In picked colonies, the white-yellow appeared only at the edges where the heat-sterilized wand had touched the colony.

These observations lead me to three conclusions: either the stock and plate have both been contaminated by a similar organism, the color change is also affected by density (perhaps quorum sensing in this organism is somehow tied to environmental temperature) or the color change is oxygen dependent (since we always limit the oxygen exposure of our 4ºC cultures to limit their growth while in storage).

I’m in the process of sequencing the 16s PCR product of these cultures so I will know soon whether the color change is due to contamination or not. If it is not contamination, figuring out the mechanism and conditions under which TTU3 will undergo a color change, may be an interesting side project to work on.

Bacterial Candidates: A Closer Look at the Contestants

After nearly ten weeks of learning our way around the lab, collecting samples, isolating organisms and sequencing their 16s ribosomal genes we are finally at the point where we are ready to choose our first candidate organisms for whole-genome sequencing!

The Plan:

  1. Choose candidate organisms
  2. Prepare a DNA library for each organism for Illumina sequencing
  3. Sequence and analyze genomes

Although we have a couple of pilot samples for which we are already preparing libraries, most of our organisms need to be screened for admittance into the elite group of “good candidate organisms.”

So who are these potential candidates?

Where were they found?

And perhaps most relevant to our project, have they been sequenced before?

Continue reading “Bacterial Candidates: A Closer Look at the Contestants”