From pairs of most similar sequences to phylogenetic best matches.

Autor: Stadler PF; 1Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.; 2Competence Center for Scalable Data Services and Solutions Dresden/Leipzig, Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv), and Leipzig Research Center for Civilization Diseases, Universität Leipzig, Augustusplatz 12, 04107 Leipzig, Germany.; 3Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany.; 4Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, 1090 Vienna, Austria.; 5Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá, Ciudad Universitaria, 111321 Bogotá, D.C. Colombia.; 6Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501 USA., Geiß M; 1Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.; 7Software Competence Center Hagenberg GmbH, Softwarepark 21, 4232 Hagenberg, Austria., Schaller D; 1Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany., López Sánchez A; CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México., González Laffitte M; CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México., Valdivia DI; 10Departamento de Ingeniería Genética, Centro de Investigación y de Estudios Avanzados del IPN (CINVESTAV), Km. 9.6 Libramiento Norte Carretera Irapuato-León, 36821 Irapuato, GTO México., Hellmuth M; 8School of Computing, University of Leeds, E C Stoner Building, Leeds, LS2 9JT UK., Hernández Rosales M; CONACYT-Instituto de Matemáticas, UNAM Juriquilla, Blvd. Juriquilla 3001, 76230 Juriquilla, Querétaro, QRO México.
Jazyk: angličtina
Zdroj: Algorithms for molecular biology : AMB [Algorithms Mol Biol] 2020 Apr 09; Vol. 15, pp. 5. Date of Electronic Publication: 2020 Apr 09 (Print Publication: 2020).
DOI: 10.1186/s13015-020-00165-2
Abstrakt: Background: Many of the commonly used methods for orthology detection start from mutually most similar pairs of genes (reciprocal best hits) as an approximation for evolutionary most closely related pairs of genes (reciprocal best matches). This approximation of best matches by best hits becomes exact for ultrametric dissimilarities, i.e., under the Molecular Clock Hypothesis. It fails, however, whenever there are large lineage specific rate variations among paralogous genes. In practice, this introduces a high level of noise into the input data for best-hit-based orthology detection methods.
Results: If additive distances between genes are known, then evolutionary most closely related pairs can be identified by considering certain quartets of genes provided that in each quartet the outgroup relative to the remaining three genes is known. A priori knowledge of underlying species phylogeny greatly facilitates the identification of the required outgroup. Although the workflow remains a heuristic since the correct outgroup cannot be determined reliably in all cases, simulations with lineage specific biases and rate asymmetries show that nearly perfect results can be achieved. In a realistic setting, where distances data have to be estimated from sequence data and hence are noisy, it is still possible to obtain highly accurate sets of best matches.
Conclusion: Improvements of tree-free orthology assessment methods can be expected from a combination of the accurate inference of best matches reported here and recent mathematical advances in the understanding of (reciprocal) best match graphs and orthology relations.
Availability: Accompanying software is available at https://github.com/david-schaller/AsymmeTree.
Competing Interests: Competing interestsThe authors declare that they have no competing interests.
(© The Author(s) 2020.)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje