Terraces in Species Tree Inference from Gene Trees

Autor: Mursalin Habib, Atif Hasan Rahman, Md. Shamsuzzoha Bayzid
Rok vydání: 2022
DOI: 10.1101/2022.11.21.517454
Popis: A terrace in a phylogenetic tree space is a region where all trees contain the same set of subtrees, due to certain patterns of missing data among the taxa sampled, resulting in an identical optimality score for a given data set. This was first investigated in the context of phylogenetic tree estimation from sequence alignments using maximum likelihood (ML) and maximum parsimony (MP). The concept of terraces was later extended to the species tree inference problem from a collection of gene trees, where a set of equally optimal species trees was referred to as a “pseudo” species tree terrace. Pseudo terraces do not consider the topological proximity of the trees in terms of the induced subtrees resulting from certain patterns of missing data. In this study, we mathematically characterize species tree terraces and investigate the mathematical properties and conditions that lead multiple species trees to induce/display an identical set of locus-specific subtrees owing to missing data, therefore, as a consequence of combinatorial structure, resulting in the same optimality score. We present analytical results and combinatorial characteristics of species tree terraces. We report that species tree terraces are agnostic to gene tree topologies and the discordance therein, considering which is central to developing statistically consistent species tree estimation techniques using gene tree distributions. Therefore, we introduce and characterize a special type of gene tree topology-aware terrace which we call “peak terrace”, highlight its impact and importance in species tree inference, and investigate conditions on the patterns of missing data that give rise to peak terraces.
Databáze: OpenAIRE