Algorithms to compute the Burrows-Wheeler Similarity Distribution
Autor: | Felipe A. Louza, Guilherme P. Telles, Simon Gog, Liang Zhao |
---|---|
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
COMPUTABILIDADE E COMPLEXIDADE General Computer Science Burrows–Wheeler transform Computer science Concatenation String (computer science) Search engine indexing Data_CODINGANDINFORMATIONTHEORY 0102 computer and information sciences 02 engineering and technology Data structure 01 natural sciences Theoretical Computer Science Set (abstract data type) Transformation (function) Similarity (network science) 010201 computation theory & mathematics Computer Science - Data Structures and Algorithms 0202 electrical engineering electronic engineering information engineering Data Structures and Algorithms (cs.DS) 020201 artificial intelligence & image processing Algorithm Data compression |
Zdroj: | Repositório Institucional da USP (Biblioteca Digital da Produção Intelectual) Universidade de São Paulo (USP) instacron:USP |
ISSN: | 0304-3975 |
Popis: | The Burrows-Wheeler transform (BWT) is a well studied text transformation widely used in data compression and text indexing. The BWT of two strings can also provide similarity measures between them, based on the observation that the more their symbols are intermixed in the transformation, the more the strings are similar. In this article we present two new algorithms to compute similarity measures based on the BWT for string collections. In particular, we present practical and theoretical improvements to the computation of the Burrows-Wheeler similarity distribution for all pairs of strings in a collection. Our algorithms take advantage of the BWT computed for the concatenation of all strings, and use compressed data structures that allow reducing the running time with a small memory footprint, as shown by a set of experiments with real and artificial datasets. Accepted to TCS |
Databáze: | OpenAIRE |
Externí odkaz: |