Had no idea this paper was coming out: PLOS ONE: Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison.
Full citation: Matsen IV FA, Evans SN (2013) Edge Principal Components and Squash Clustering: Using the Special Structure of Phylogenetic Placement Data for Sample Comparison. PLoS ONE 8(3): e56859. doi:10.1371/journal.pone.0056859
And it is very very cool. My lab has been working with / collaborating with / wanting to be like Erik Matsen for a few years now and this paper is one of the reasons why. In this paper Matsen and Evans detail some really powerful and fascinating tools for phylogeny driven analysis of microbial communities.
Edge principal component analysis (edge PCA)
- “enables the detection of important differences between samples that contain closely related taxa“. (from the abstract)
- “applies the standard principal components construction to a “data matrix” generated from the differences between proportions of phylogenetic placements on either side of each internal edge of the reference phylogenetic tree.” (from the Introduction)
- “outputs a (rooted) clustering tree in which each internal node corresponds to an appropriate “average” of the original samples at the leaves below the node. Moreover, the length of an edge is a suitably defined distance between the averaged samples associated with the two incident nodes, rather than the less interpretable average of distances produced by UPGMA, the most widely used hierarchical clustering method in this context“. (from the Abstract)
- is hierarchical clustering with a novel way of merging clusters that incorporates information concerning how the data sit on the reference phylogenetic tree (from the Introduction)
UPDATE 3/21 – switched the captions for Figure 2 and 3 as per Matsen’s comment that the legends were switched in production of the paper.