Accounting for ambiguity in ancestral sequence reconstruction

Autor:	Laurent Brehelin, Olivier Gascuel, Adrien Oliva, Sylvain Pulicani, Stéphane Guindon, Vincent Lefort
Přispěvatelé:	Méthodes et Algorithmes pour la Bioinformatique (MAB), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM), Australian Centre for Ancient DNA, University of Adelaide, Bioinformatique évolutive - Evolutionary Bioinformatics, Institut Pasteur [Paris]-Centre National de la Recherche Scientifique (CNRS), This research was supported by the Institut Français de Bioinformatique (RENABI-IFB, Investissements d’Avenir, ANR-11-INBS-0013) and the Agence Nationale pour la Recherche through the project GENOSPACE., ANR-11-INBS-0013,IFB (ex Renabi-IFB),Institut français de bioinformatique(2011), ANR-16-CE02-0008,GenoSpace,Nouveaux outils statistiques pour l'analyse spatiale des données génétiques(2016), Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS), Institut Pasteur [Paris] (IP)-Centre National de la Recherche Scientifique (CNRS)
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	0106 biological sciences Statistics and Probability Biometry Computer science media_common.quotation_subject Inference Scale (descriptive set theory) [SDV.BID.SPT]Life Sciences [q-bio]/Biodiversity/Systematics Phylogenetics and taxonomy 010603 evolutionary biology 01 natural sciences Biochemistry Set (abstract data type) Evolution Molecular 03 medical and health sciences Molecular evolution Position (vector) Maximum a posteriori estimation Nucleotide Amino Acid Sequence Molecular Biology Phylogeny 030304 developmental biology media_common chemistry.chemical_classification 0303 health sciences Sequence Likelihood Functions Phylogenetic tree [SDV.BID.EVO]Life Sciences [q-bio]/Biodiversity/Populations and Evolution [q-bio.PE] Ambiguity [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] Computer Science Applications Computational Mathematics Tree (data structure) Computational Theory and Mathematics chemistry Algorithm Sequence Analysis
Zdroj:	Bioinformatics Bioinformatics, Oxford University Press (OUP), 2019, 35 (21), pp.4290-4297. ⟨10.1093/bioinformatics/btz249⟩ Bioinformatics, 2019, 35 (21), pp.4290-4297. ⟨10.1093/bioinformatics/btz249⟩
ISSN:	1367-4803 1367-4811
DOI:	10.1093/bioinformatics/btz249⟩
Popis:	MotivationThe reconstruction of ancestral genetic sequences from the analysis of contemporaneous data is a powerful tool to improve our understanding of molecular evolution. Various statistical criteria defined in a phylogenetic framework can be used to infer nucleotide, amino-acid or codon states at internal nodes of the tree, for every position along the sequence. These criteria generally select the state that maximizes (or minimizes) a given criterion. Although it is perfectly sensible from a statistical perspective, that strategy fails to convey useful information about the level of uncertainty associated to the inference.ResultsThe present study introduces a new criterion for ancestral sequence reconstruction, the minimum posterior expected error (MPEE), that selects a single state whenever the signal conveyed by the data is strong, and a combination of multiple states otherwise. We also assess the performance of a criterion based on the Brier scoring scheme which, like MPEE, does not rely on any tuning parameters. The precision and accuracy of several other criteria that involve arbitrarily set tuning parameters are also evaluated. Large scale simulations demonstrate the benefits of using the MPEE and Brier-based criteria with a substantial increase in the accuracy of the inference of past sequences compared to the standard approach and realistic compromises on the precision of the solutions returned.Availability and implementationThe software package PhyML (https://github.com/stephaneguindon/phyml) provides an implementation of the Maximum A Posteriori (MAP) and MPEE criteria for reconstructing ancestral nucleotide and amino-acid sequences.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d6be3b9f8e69cd8aea634f31a18984a0 https://hal-pasteur.archives-ouvertes.fr/pasteur-02404399/file/ancestral_clean.pdf Zobrazit plný text záznamu