Pattern matching on elastic-degenerate text with errors

Autor: Giulia Bernardini, Giovanna Rosone, Nadia Pisanti, Solon P. Pissis
Přispěvatelé: Bernardini, G, Pisanti, N, Pissis, Sp, Rosone, G, Dipartimento di Matematica [Pisa], University of Pisa - Università di Pisa, Dipartimento di Informatica [Pisa], Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Informatics [King's College London], King‘s College London, Pissis, S
Jazyk: angličtina
Rok vydání: 2017
Předmět:
0301 basic medicine
Uncertain sequences
Elastic-degenerate strings
Degenerate strings
Pan-genome
Pattern matching

Elastic-degenerate string
indeterminate strings
Computer science
[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]
0102 computer and information sciences
String processing
Pan-genome
01 natural sciences
03 medical and health sciences
Degenerate string
Pattern matching
elastic-degenerate strings
pan-genome
indeterminate strings
degenerate strings

degenerate strings
Information retrieval
Spire (mollusc)
INF/01 - INFORMATICA
Uncertain sequences
Elastic-degenerate strings
Degenerate strings
Uncertain sequence
030104 developmental biology
pattern matching
010201 computation theory & mathematics
pan-genome
[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]
elastic-degenerate strings
uncertain strings
Zdroj: String Processing and Information Retrieval ISBN: 9783319674278
LNCS
SPIRE 2017-24th International Symposium on String Processing and Information Retrieval
SPIRE 2017-24th International Symposium on String Processing and Information Retrieval, Sep 2017, Palermo, Italy. pp.74-90, ⟨10.1007/978-3-319-67428-5_7⟩
DOI: 10.1007/978-3-319-67428-5_7
Popis: International audience; An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g. pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists an O(nm 2 + N)-time algorithm to solve this problem on-line after a pre-processing stage with time and space O(m). In this paper, we study the same problem under the edit distance model and present an O(k 2 mG + kN)-time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG + kN)-time and O(m)-space algorithm for Hamming distance.
Databáze: OpenAIRE