A simulation study comparing supertree and combined analysis methods using SMIDGen
Autor: | C. Randal Linder, M. Shel Swenson, François Barbançon, Tandy Warnow |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2010 |
Předmět: |
lcsh:QH426-470
0206 medical engineering Matrix representation 02 engineering and technology Biology computer.software_genre Set (abstract data type) 03 medical and health sciences Structural Biology Supermatrix Molecular Biology lcsh:QH301-705.5 Analysis method 030304 developmental biology 0303 health sciences business.industry Applied Mathematics Research Pattern recognition Base (topology) Supertree Maximum parsimony Tree (data structure) lcsh:Genetics Computational Theory and Mathematics lcsh:Biology (General) Artificial intelligence Data mining business computer 020602 bioinformatics |
Zdroj: | Algorithms for Molecular Biology, Vol 5, Iss 1, p 8 (2010) Algorithms for Molecular Biology : AMB |
ISSN: | 1748-7188 |
Popis: | Background Supertree methods comprise one approach to reconstructing large molecular phylogenies given multi-marker datasets: trees are estimated on each marker and then combined into a tree (the "supertree") on the entire set of taxa. Supertrees can be constructed using various algorithmic techniques, with the most common being matrix representation with parsimony (MRP). When the data allow, the competing approach is a combined analysis (also known as a "supermatrix" or "total evidence" approach) whereby the different sequence data matrices for each of the different subsets of taxa are concatenated into a single supermatrix, and a tree is estimated on that supermatrix. Results In this paper, we describe an extensive simulation study we performed comparing two supertree methods, MRP and weighted MRP, to combined analysis methods on large model trees. A key contribution of this study is our novel simulation methodology (Super-Method Input Data Generator, or SMIDGen) that better reflects biological processes and the practices of systematists than earlier simulations. We show that combined analysis based upon maximum likelihood outperforms MRP and weighted MRP, giving especially big improvements when the largest subtree does not contain most of the taxa. Conclusions This study demonstrates that MRP and weighted MRP produce distinctly less accurate trees than combined analyses for a given base method (maximum parsimony or maximum likelihood). Since there are situations in which combined analyses are not feasible, there is a clear need for better supertree methods. The source tree and combined datasets used in this study can be used to test other supertree and combined analysis methods. |
Databáze: | OpenAIRE |
Externí odkaz: |