Accounting for ambiguity in ancestral sequence reconstruction

Autor: Laurent Brehelin, Olivier Gascuel, Adrien Oliva, Sylvain Pulicani, Stéphane Guindon, Vincent Lefort
Přispěvatelé: Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Australian Centre for Ancient DNA, University of Adelaide, Bioinformatique évolutive - Evolutionary Bioinformatics, Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS), This research was supported by the Institut Français de Bioinformatique (RENABI-IFB, Investissements d’Avenir, ANR-11-INBS-0013) and the Agence Nationale pour la Recherche through the project GENOSPACE., ANR-11-INBS-0013,IFB (ex Renabi-IFB),Institut français de bioinformatique(2011), ANR-16-CE02-0008,GenoSpace,Nouveaux outils statistiques pour l'analyse spatiale des données génétiques(2016), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2019
Předmět:
0106 biological sciences
Statistics and Probability
Biometry
Computer science
media_common.quotation_subject
Inference
Scale (descriptive set theory)
[SDV.BID.SPT]Life Sciences [q-bio]/Biodiversity/Systematics
Phylogenetics and taxonomy

010603 evolutionary biology
01 natural sciences
Biochemistry
Set (abstract data type)
Evolution
Molecular

03 medical and health sciences
Molecular evolution
Position (vector)
Maximum a posteriori estimation
Nucleotide
Amino Acid Sequence
Molecular Biology
Phylogeny
030304 developmental biology
media_common
chemistry.chemical_classification
0303 health sciences
Sequence
Likelihood Functions
Phylogenetic tree
[SDV.BID.EVO]Life Sciences [q-bio]/Biodiversity/Populations and Evolution [q-bio.PE]
Ambiguity
[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]
Computer Science Applications
Computational Mathematics
Tree (data structure)
Computational Theory and Mathematics
chemistry
Algorithm
Sequence Analysis
Zdroj: Bioinformatics
Bioinformatics, Oxford University Press (OUP), 2019, 35 (21), pp.4290-4297. ⟨10.1093/bioinformatics/btz249⟩
Bioinformatics, 2019, 35 (21), pp.4290-4297. ⟨10.1093/bioinformatics/btz249⟩
ISSN: 1367-4803
1367-4811
DOI: 10.1093/bioinformatics/btz249⟩
Popis: MotivationThe reconstruction of ancestral genetic sequences from the analysis of contemporaneous data is a powerful tool to improve our understanding of molecular evolution. Various statistical criteria defined in a phylogenetic framework can be used to infer nucleotide, amino-acid or codon states at internal nodes of the tree, for every position along the sequence. These criteria generally select the state that maximizes (or minimizes) a given criterion. Although it is perfectly sensible from a statistical perspective, that strategy fails to convey useful information about the level of uncertainty associated to the inference.ResultsThe present study introduces a new criterion for ancestral sequence reconstruction, the minimum posterior expected error (MPEE), that selects a single state whenever the signal conveyed by the data is strong, and a combination of multiple states otherwise. We also assess the performance of a criterion based on the Brier scoring scheme which, like MPEE, does not rely on any tuning parameters. The precision and accuracy of several other criteria that involve arbitrarily set tuning parameters are also evaluated. Large scale simulations demonstrate the benefits of using the MPEE and Brier-based criteria with a substantial increase in the accuracy of the inference of past sequences compared to the standard approach and realistic compromises on the precision of the solutions returned.Availability and implementationThe software package PhyML (https://github.com/stephaneguindon/phyml) provides an implementation of the Maximum A Posteriori (MAP) and MPEE criteria for reconstructing ancestral nucleotide and amino-acid sequences.
Databáze: OpenAIRE