phyBWT: Alignment-Free Phylogeny via eBWT Positional Clustering
Autor: | Guerrini, Veronica, Conte, Alessio, Grossi, Roberto, Liti, Gianni, Rosone, Giovanna, Tattini, Lorenzo |
---|---|
Přispěvatelé: | University of Pisa - Università di Pisa, Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Laboratoire de Biométrie et Biologie Evolutive - UMR 5558 (LBBE), Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon-Institut National de Recherche en Informatique et en Automatique (Inria)-VetAgro Sup - Institut national d'enseignement supérieur et de recherche en alimentation, santé animale, sciences agronomiques et de l'environnement (VAS)-Centre National de la Recherche Scientifique (CNRS)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), CNRS UMR 7284, Inserm U 1081, Université Côte d'Azur |
Jazyk: | angličtina |
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | WABI 2022-22nd International Workshop on Algorithms in Bioinformatics WABI 2022-22nd International Workshop on Algorithms in Bioinformatics, 2022, Berlin/Postdam, Germany. ⟨10.4230/LIPIcs.WABI.2022.23⟩ |
DOI: | 10.4230/LIPIcs.WABI.2022.23⟩ |
Popis: | Molecular phylogenetics is a fundamental branch of biology. It studies the evolutionary relationships among the individuals of a population through their biological sequences, and may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. In this paper we develop a method called phyBWT, describing how to use the extended Burrows-Wheeler Transform (eBWT) for a collection of DNA sequences to directly reconstruct phylogeny, bypassing the alignment against a reference genome or de novo assembly. Our phyBWT hinges on the combinatorial properties of the eBWT positional clustering framework. We employ eBWT to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori), and build a suitable decomposition leading to a phylogenetic tree, step by step. As a result, phyBWT is a new alignment-, assembly-, and reference-free method that builds a partition tree without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. The preliminary experimental results on sequencing data show that our method can handle datasets of different types (short reads, contigs, or entire genomes), producing trees of quality comparable to that found in the benchmark phylogeny. LIPIcs, Vol. 242, 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022), pages 23:1-23:19 |
Databáze: | OpenAIRE |
Externí odkaz: |