New openaccess paper from my lab on "Zorro" software for automated masking of sequence alignments

A new Open Access paper from my lab was just published in PLoS One: Accounting For Alignment Uncertainty in Phylogenomics. Wu M, Chatterji S, Eisen JA (2012) Accounting For Alignment Uncertainty in Phylogenomics. PLoS ONE 7(1): e30288. doi:10.1371/journal.pone.0030288

The paper describes the software “Zorro” which is used for automated “masking” of sequence alignments.  Basically, if you have a multiple sequence alignment you would like to use to infer a phylogenetic tree, in some cases it is desirable to block out regions of the alignment that are not reliable.  This blocking is called “masking.”

Masking is thought by many to be important because sequence alignments are in essence a hypothesis about the common ancestry of specific residues in different genes/proteins/regions of the genome.  This “positional homology” is not always easy to assign and for regions where positional homology is ambiguous it may be better to ignore such regions when inferring phylogenetic trees from alignments.

Historically, masking has been done by hand/eye looking for columns in a multiple sequence alignment that seem to have issues and then either eliminating those columns or giving them a lower weight and using a weighting scheme in the phylogenetic analysis.

What Zorro does is it removes much of the subjectivity of this process and generates automated masking patterns for sequence alignments.  It does this by assigning confidence scores to each column in a multiple seqeunce alignment. These scores can then be used to account for alignment accuracy in phylogenetic inference pipelines.

The software is available at Sourceforge: ZORRO – probabilistic masking for phylogenetics.  It was written primarily by Martin Wu (who is now a Professor at the University of Virginia) and Sourav Chatterji with a little help here and there from Aaron Darling I think.  The development of Zorro was part of my “iSEEM” project that was supported by the Gordon and Betty Moore Foundation.

In the interest of sharing, since the paper is fully open access, I am posting it here below the fold. UPDATE 2/9 – decided to remove this since it got in the way of getting to the comments …

Announcement: Workshop on Multiple Sequence Alignment and Phylogeny Estimation

Posting this for Tandy Warnow

Workshop on Advances in Multiple Sequence Alignment and Phylogeny Estimation

May 20 and 21, 2012, Smithsonian Institution, Washington, DC

The workshop is funded by the National Science Foundation through grant DEB 0733029 to the University of Texas. Registration is required, and attendance is limited to 40 participants. The workshop will include presentations of new methods for multiple sequence alignment and phylogeny estimation, also training in the use of these methods, and personal assistance in analyzing datasets using the SATé software (see this page). Applications for the workshop (and for travel support) are due by February 15, 2012, and will be responded to by March 1. We expect to be able to provide support to all attendees. Please click here for the application form. For more information, please send an email to Tandy Warnow (see below).

Letter from Tandy explaining workshop:
Dear Colleagues,
We are writing to let you know about a workshop and symposium that we will hold on May 20-22, 2012, at the Smithsonian Institution in Washington, DC. The workshop will provide training in advanced methods for multiple sequence alignment and phylogeny estimation, and will take place on May 20 and 21; the symposium will follow immediately and will feature research presentations on the same topic. This workshop is funded by:
The workshop will include presentations of new methods for maximum likelihood phylogeny estimation of large sequence alignments (including GARLI and FastTree), for comparing different alignments of the same dataset, for phylogenetic analyses of datasets that include partial sequences (e.g., short reads generated in a metagenomic analysis), for supertree estimation, and for simulating sequence evolution. However, a main focus is to train participants in both basic and advanced use of the SATé software (Liu et al. 2009, Science, Vol. 324, no. 5934, pp. 1561-1564) for simultaneous estimation of alignments and trees (SATé software available for download at ).
Workshop participants are expected to bring laptops with them to the workshop, so that they can perform alignment and phylogenetic tree estimations. We will provide test datasets for you to learn how to use SATé, but strongly encourage you to bring your own datasets to analyze.
Attendance at the workshop is limited to 40 participants, and registration is required. If you are interested in attending the workshop, whether or not you are requesting travel support, please fill out the Word document available at, and return it to Laurie Alvarez ( by February 15, 2012. We will respond to requests for registration by March 1, 2012.
For more information on the workshop, please contact me (Tandy Warnow), at For more information on the Symposium, please contact Mike Braun ( We look forward to seeing you at the Smithsonian workshop and symposium!
Tandy Warnow and Mike Braun
On behalf of the AToL project team:
  • Michael Braun, The Smithsonian Institution 
  • Mark Holder, The University of Kansas
  • Jim Leebens-Mack, The University of Georgia 
  • Randy Linder, The University of Texas 
  • Etsuko Moriyama, The University of Nebraska 
  • Tandy Warnow, The University of Texas