The Prevalence and Impact of Model Violations in Phylogenetic Analysis
Autor: | Robert Lanfear, Bui Quang Minh, Eric A. Stone, Suha Naser-Khdour, Wenqi Zhang |
---|---|
Rok vydání: | 2019 |
Předmět: |
0106 biological sciences
systematic bias Phylogenetic inference Inference Biology 010603 evolutionary biology 01 natural sciences Evolution Molecular 03 medical and health sciences Bias Phylogenetics phylogenetic inference Statistics Genetics Base Pairing Phylogeny Ecology Evolution Behavior and Systematics 030304 developmental biology Likelihood Functions 0303 health sciences Models Genetic Phylogenetic tree Homogeneity (statistics) model violations Homogeneous test of symmetry Software Research Article |
Zdroj: | Genome Biology and Evolution |
ISSN: | 1759-6653 |
DOI: | 10.1093/gbe/evz193 |
Popis: | In phylogenetic inference, we commonly use models of substitution which assume that sequence evolution is stationary, reversible, and homogeneous (SRH). Although the use of such models is often criticized, the extent of SRH violations and their effects on phylogenetic inference of tree topologies and edge lengths are not well understood. Here, we introduce and apply the maximal matched-pairs tests of homogeneity to assess the scale and impact of SRH model violations on 3,572 partitions from 35 published phylogenetic data sets. We show that roughly one-quarter of all the partitions we analyzed (23.5%) reject the SRH assumptions, and that for 25% of data sets, tree topologies inferred from all partitions differ significantly from topologies inferred using the subset of partitions that do not reject the SRH assumptions. This proportion increases when comparing trees inferred using the subset of partitions that rejects the SRH assumptions, to those inferred from partitions that do not reject the SRH assumptions. These results suggest that the extent and effects of model violation in phylogenetics may be substantial. They highlight the importance of testing for model violations and possibly excluding partitions that violate models prior to tree reconstruction. Our results also suggest that further effort in developing models that do not require SRH assumptions could lead to large improvements in the accuracy of phylogenomic inference. The scripts necessary to perform the analysis are available in https://github.com/roblanf/SRHtests, and the new tests we describe are available as a new option in IQ-TREE (http://www.iqtree.org). |
Databáze: | OpenAIRE |
Externí odkaz: |