Rate matrices for analyzing large families of protein sequences
Autor: | Alain Hénaut, Jean-Loup Risler, Claudine Devauchelle, Bruno Torrésani, Monique Monnerot, Matthias Holschneider, Alexander Grossmann |
---|---|
Přispěvatelé: | Laboratoire Génome et Informatique, Centre National de la Recherche Scientifique (CNRS), Centre de Physique Théorique - UMR 6207 (CPT), Université de la Méditerranée - Aix-Marseille 2-Université de Provence - Aix-Marseille 1-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), Centre de génétique moléculaire (CGM), Université Paris-Sud - Paris 11 (UP11)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'Analyse, Topologie, Probabilités (LATP), Université Paul Cézanne - Aix-Marseille 3-Université de Provence - Aix-Marseille 1-Centre National de la Recherche Scientifique (CNRS) |
Jazyk: | angličtina |
Rok vydání: | 2004 |
Předmět: |
0106 biological sciences
Markov model DNA Mitochondrial 010603 evolutionary biology 01 natural sciences Evolution Molecular Combinatorics 03 medical and health sciences Matrix (mathematics) Tree (descriptive set theory) Sequence Analysis Protein Genetics Computer Simulation Divergence (statistics) Molecular Biology Phylogeny 030304 developmental biology Mathematics Stochastic Processes 0303 health sciences Multiple sequence alignment Markov chain Stochastic process Computational Biology Proteins Markov Chains Computational Mathematics Computational Theory and Mathematics Modeling and Simulation Principal component analysis [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] Sequence Alignment Algorithm |
Zdroj: | Journal of Computational Biology Journal of Computational Biology, 2004, 8 (4), pp.381-399. ⟨10.1089/106652701752236205⟩ Journal of Computational Biology, Mary Ann Liebert, 2004, 8 (4), pp.381-399. ⟨10.1089/106652701752236205⟩ |
ISSN: | 1066-5277 1557-8666 |
DOI: | 10.1089/106652701752236205⟩ |
Popis: | International audience; We propose and study a new approach for the analysis of families of protein sequences. This method is related to the LogDet distances used in phylogenetic reconstructions; it can be viewed as an attempt to embed these distances into a multidimensional framework. The proposed method starts by associating a Markov matrix to each pairwise alignment deduced from a given multiple alignment. The central objects under consideration here are matrix-valued logarithms L of these Markov matrices, which exist under conditions that are compatible with fairly large divergence between the sequences. These logarithms allow us to compare data from a family of aligned proteins with simple models (in particular, continuous reversible Markov models) and to test the adequacy of such models. If one neglects fluctuations arising from the finite length of sequences, any continuous reversible Markov model with a single rate matrix Q over an arbitrary tree predicts that all the observed matrices L are multiples of Q. Our method exploits this fact, without relying on any tree estimation. We test this prediction on a family of proteins encoded by the mitochondrial genome of 26 multicellular animals, which include vertebrates, arthropods, echinoderms, molluscs, and nematodes. A principal component analysis of the observed matrices L shows that a single rate model can be used as a rough approximation to the data, but that systematic deviations from any such model are unmistakable and related to the evolutionary history of the species under consideration. |
Databáze: | OpenAIRE |
Externí odkaz: |