1. Typo in last line of first paragraph of abstract.
2. Methods third paragraph, second line. Change “data is” to “data are.”
3. I think the paper might benefit by more comparison and contrast of results between the concatenation and concordance approaches here, but that is up to the authors of course.
4. Given that straight taxonomic congruence is rarely used these days in phylogenomic analyses, maybe the authors could execute straight taxonomic congruence analyses
5. justify the expected concordance priors used for the Bucky analyses?
6. maybe a description of how ML bootstrap trees are substituted for the usual input for the Bucky analyses (Bayesian topologies)?
7. Paragraph on Bayesian concatenation in Methods. In last sentence should “below 0.1” instead be “below 0.01”?
8. First, when the authors discuss LBA, possibly more important is to discuss model misfit? If the ML model was a good fit, LBA would not occur, as I understand the concept, and the sometimes inexplicable results that the authors report would not have occured. So the misplacement of long branches here is maybe not really classic LBA (long branches attracting each other), because ML when perfectly implemented should not have LBA.
9. Second, I would strongly suggest that the authors execute simple parsimony analyses of their data (maybe this would not take too long to run in TNT?). Do parsimony analyses give much worse results than the ML or maybe better than ML? What if used Goloboff weighting that downweights sites according to the homoplasy at particular sites in the data (not an a priori picked model that is surely very wrong anyway)? I would be curious to see what sort of trees popped out of such analyses
10. Last paragraph in “Bucky” discussion section notes that the concordance approach extracts hidden signal without resorting to concatenation (“resorting to” sounds like the concatenation process is painful?). However, here again, I think it would be productive for the authors to execute a simple taxonomic congruence tree, maybe a simple 50% majority rule consensus of their single gene bootstrap trees, or a 50% majority rule consensus of the strict consensus trees for the optimal trees for each gene. If these simple consensus procedures yield trees that are highly consistent with concatenation, it would show that it is not Bucky concordance that is extracting hidden support, because taxonomic congruence, which ignores hidden support, gives similar trees to concatenation? Another point is that even if Bucky yields a topology that is similar to concatenation, it is not clear that Bucky is extracting nearly as much hidden support as the concatenation approach. For example, Gatesy and Baker (2005) have shown that even by combining completely congruent genes, huge amounts of hidden support can emerge; even if Bucky gives a similar tree to concatenation, it might give a very weakly supported tree that did not successfully extract hidden support efficiently?
11. Placement of Interesting Taxa, first paragraph. The 50% threshold for increased support in concatenation relative to 16S seems too low to me, but that is just my opinion.
12. The most important concern I have is that I don’t really know what the authors are trying to say. Are they saying that the supertree approach of using BUCKy on RAxML bootstrap trees is as good as RAxML on the combined dataset or that RAxML is better? Are they saying that they’ve learned something interesting about microbial phylogeny? Are they saying that MrBayes is infeasible for large-scale phylogeny estimation? I’m just not sure what their take home message is.
13 The authors conjecture that the reason the BUCKy analysis produced low support values is that the individual genes had low signal. However, I think there may be other reasons that the authors need to investigate. First, according to the authors, because BUCKy requires that every gene tree contain all the taxa, the authors added completely gapped sequences to each dataset before running RAxML to estimate the gene trees. This has the consequence that the added taxa are inserted randomly into the gene trees. It is not at all surprising that supertree analyses that are based upon gene trees with some completely randomly inserted taxa would be have low support. This makes all the analyses based upon BUCKy unreliable. Note, the problem caused by adding empty sequences to the individual gene sequence datasets does not impact the combined analysis step, so this is something that only impacts BUCKy. Note also that the authors could have avoided this problem by simply restricting the analysis to only those genes that truly did contain at least one copy of each taxon; this problem is caused by using the additional genes that were not universal.
14. If the authors added the empty sequences to the sequence alignments given to MrBayes, then it is also not surprising that MrBayes would fail to converge in a reasonable timeframe. This means that conclusions about MrBayes not converging might need t be revisited.
15. the use of only 100 bootstrap replicates for a dataset of this size is questionable. It is possible, therefore, that BUCKy would produce a different species tree (with higher support values) if the authors provided it with substantially more than 100 trees for each gene, each produced on a different bootstrap replicate.
16. Also, it is essential to report the statistics that let the user know whether the rapid bootstrapping technique in RAxML has converged
17. The running time to compute BUCKy is extremely fast, and I find it difficult to reconcile the reported running time for a much smaller dataset as reported in Yang and Warnow 2010 (the paper referenced in this study as recommending RAxML bootstrapping instead of MrBayes) and the running time reported in this paper. I think that the reason they were able to get the BUCKy analyses to complete quickly is that they disabled the population tree estimation, which takes $\Omega(n^4)$ time; it that is the case, the authors should point this out, and also make the code available.)
18. I could not find the datasets and trees in TreeBASE. Nothing turned up when searching under Lang or Eisen as authors that mentioned this paper. The authors should give links to these datasets, if possible.
19. The text in the conclusions section in the paper do not justify the recommendation given in the abstract that combined analysis, using RAxML, is preferred to BUCKy. However, it seems that the authors may be using the fact that the RAXML with rapid bootstrapping tree had higher support values than those produced by BUCKy, in order to justify this recommendation. Given that high bootstrap support can exist in combined analyses (as noted by the authors) and for the wrong tree, this doesn’t seem to be a good reason. Furthermore, note the earlier comments about support values.
20. The evidence that MrBayes does not converge on the 841 taxon single gene datasets, even given 9 months of analysis, is very interesting – and, if valid, really significant for researchers. However, the maximum permitted ASDSF is given differently in two places in the paper: initially as 0.1 and later as 0.01. The correct one should be used throughout. Also, as discussed above, if the MrBayes analyses of individual gene sequence alignments were based upon datasets that contained completely gapped sequences, then the failure to converge is not at all surprising. (Finally, it seems the same observation about MrBayes failing to converge was made in Yang and Warnow, and so a general observation that may be important to communicate.)
21. The authors fail to mention any standard supertree methods, which is strange. Why isn’t MRP mentioned, for example? For the special case handled in this paper (where all gene trees have one copy of each species), the use of consensus methods (like the majority consensus) can also be considered. Finally, there are many supertree methods that do take incomplete lineage sorting into consideration, but not mentioned here, and that have excellent performance (*BEAST, for example); at a minimum, the authors should discuss the others.
22. The authors say that the gene trees had relatively low signal, pointing to the short sequence lengths. However, other factors could be involved — including the fact that the sequences were AA instead of nucleotides. If the nucleotides for the sequences are available, phylogenies based upon nucleotides could be more informative. Also, to strengthen the evidence that their AA sequence alignments contain low signal, the authors could provide more statistics about their alignments, such as the average percent identity, the minimum percent identity, and the number of gaps in the alignments.
23. The use of the Robinson-Foulds distance seems potentially problematic, since at least one of their trees is not fully resolved. In any event, the values should be presented proportionally, to help readers understand the trends. If the trees have 841 leaves, then the RF distances will range from 0 to 1676. Therefore, values can be expressed as a percentage, with 200 RF distance equivalent to only 12% in RF error rate. If the trees they show are only on Bacteria, then the number of taxa will go down; all the more reason to specify, in each case, the RF rate rather than the RF distance (the actual number of unique splits).
24. Page 6, line 57 “causing those taxa to be placed randomly on the trees.” Are you sure that taxa consisting entirely of missing data will be placed randomly? There are cases in which random sequences are not equally likely to be placed in any position on a tree (Susko et al 2005, J Mol Evol 61:351-359). I think that including the taxa with entirely missing data is appropriate, but suggest deleting “causing those taxa to be placed randomly on the trees.” There’s no need to go into more detail or to cite the reference.
25. Throughout, the metric should be “Robinson-Foulds”, not “Robinson-Fould’s” (and the second-author’s name is missing from reference 50).
26. Page 10, line 34 and Figure 7 legend “significant positive correlation” should read “significant negative correlation”.
27. Page 10, line 50 “also, the BUCKy tree does not include branch lengths”, but the branches on Figure 5 are not all the same length. Add a brief explanation in the figure legend.
28. Italicize species names in reference list.
29. Figure 7 could be improved by redrawing as follows. Label both axes. Delete the horizontal grid lines. Give the R^2 to only a couple of decimal places. Consider whether the fitted line is needed, and if so, explain what it is in the figure legend (a least-squares regression line?). If the figure is intended only to show the association between RF distance and alignment length, rather than to predict one from the other, then the line is not needed. Finally, there are two influential points (the two longest alignments). If those two were absent, the relationship would be rather different. Is this because the relationship is nonlinear over a wide enough range of alignment length, or because there’s something else unusual about the two longest alignments? It might be worth commenting briefly in the results section.