New openaccess paper from my lab on "Zorro" software for automated masking of sequence alignments

A new Open Access paper from my lab was just published in PLoS One: Accounting For Alignment Uncertainty in Phylogenomics. Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone.0030288

The paper describes the software “Zorro” which is used for automated “masking” of sequence alignments.  Basically, if you have a multiple sequence alignment you would like to use to infer a phylogenetic tree, in some cases it is desirable to block out regions of the alignment that are not reliable.  This blocking is called “masking.”

Masking is thought by many to be important because sequence alignments are in essence a hypothesis about the common ancestry of specific residues in different genes/proteins/regions of the genome.  This “positional homology” is not always easy to assign and for regions where positional homology is ambiguous it may be better to ignore such regions when inferring phylogenetic trees from alignments.

Historically, masking has been done by hand/eye looking for columns in a multiple sequence alignment that seem to have issues and then either eliminating those columns or giving them a lower weight and using a weighting scheme in the phylogenetic analysis.

What Zorro does is it removes much of the subjectivity of this process and generates automated masking patterns for sequence alignments.  It does this by assigning confidence scores to each column in a multiple seqeunce alignment. These scores can then be used to account for alignment accuracy in phylogenetic inference pipelines.

The software is available at Sourceforge: ZORRO – probabilistic masking for phylogenetics.  It was written primarily by Martin Wu (who is now a Professor at the University of Virginia) and Sourav Chatterji with a little help here and there from Aaron Darling I think.  The development of Zorro was part of my “iSEEM” project that was supported by the Gordon and Betty Moore Foundation.

In the interest of sharing, since the paper is fully open access, I am posting it here below the fold. UPDATE 2/9 – decided to remove this since it got in the way of getting to the comments …

Author: Jonathan Eisen

I am an evolutionary biologist and a Professor at U. C. Davis. (see my lab site here). My research focuses on the origin of novelty (how new processes and functions originate). To study this I focus on sequencing and analyzing genomes of organisms, especially microbes and using phylogenomic analysis

7 thoughts on “New openaccess paper from my lab on "Zorro" software for automated masking of sequence alignments”

  1. Gonna have to take a look — was not aware of yours though from first skim probably should have been — the perils of submitting a paper years ago and then just trying to finish it up and deal with reviewers comments over a few years —


  2. OK – have gone through PICS-ORD paper and some blog posts about it (from Squamules and Pandas Thumb . Very interesting. It would be interesting to compare and contrast how Zorro and PICS-ORD work on the same data set(s). Not sure what is going to happen with Zorro. The first author Martin is the person to discuss it with probably since he started working on it in my lab but then got one of those faculty jobs at UVA and (I assume) is going to continue to work on it if he can. I will send him a message and see if we can bring him into the conversation …


  3. Hi Reed, apparently we missed your paper. Looks like Zorro and PICS-Ord take two complete different approaches. Not sure how to integrate them. I'm all ears if you have some ideas.


  4. Hi Martin and Jonathan,

    The purpose of Zorro is to mask reliable and unreliable regions of an alignment, correct? With PICS-Ord we convert the unreliable regions (post masking) into pseudo-characters that capture much of the phylogenetic information that exists in these regions. Right now we are using Viterbi-esque distance scores produced by Ngila, but I do want to use the forward-backward distance scores produced by the software I wrote for Cartwright (2009). (The software needs to be written to be more robust, which is why I haven't distributed it yet.)

    So the way I see the two programs working together is 1) Zorro discovers regions with low alignment support and then 2) picsord converts these regions to pseudo-characters to aid reconstruction.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: