Approximate pattern matching on elastic-degenerate text
Autor: | Solon P. Pissis, Giovanna Rosone, Giulia Bernardini, Nadia Pisanti |
---|---|
Přispěvatelé: | Università degli Studi di Milano-Bicocca [Milano] (UNIMIB), Department of Computer Science [Pisa], University of Pisa - Università di Pisa, Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Centrum Wiskunde & Informatica (CWI), Centrum Wiskunde & Informatica, Amsterdam (CWI), The Netherlands, Bernardini, G, Pisanti, N, Pissis, Sp, Rosone, G, Pissis, S, Università degli Studi di Milano-Bicocca = University of Milano-Bicocca (UNIMIB) |
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Uncertain sequences
Degenerate strings Elastic-degenerate strings Pattern matching Pan-genome uncertain sequences General Computer Science Elastic-degenerate string [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] 0102 computer and information sciences 02 engineering and technology 01 natural sciences Theoretical Computer Science Combinatorics Simple (abstract algebra) Degenerate string 0202 electrical engineering electronic engineering information engineering C++ string handling ComputingMilieux_MISCELLANEOUS Mathematics degenerate strings Sequence Multiple sequence alignment Hamming distance Substring Uncertain sequence pattern matching 010201 computation theory & mathematics 020201 artificial intelligence & image processing Edit distance Uncertain sequences Degenerate strings Elastic-degenerate strings Pattern matching Pan-genome pan-genome elastic-degenerate strings |
Zdroj: | Theoretical Computer Science Theoretical Computer Science, Elsevier, 2019, pp.1-30. ⟨10.1016/j.tcs.2019.08.012⟩ Theoretical Computer Science, 812, 109-122 Theoretical Computer Science, 2019, pp.1-30. ⟨10.1016/j.tcs.2019.08.012⟩ |
ISSN: | 1879-2294 0304-3975 |
DOI: | 10.1016/j.tcs.2019.08.012⟩ |
Popis: | An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g., pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists a non-combinatorial O(nm1.381+N)-time algorithm to solve this problem on-line [1]. In this paper, we study the same problem under the edit distance model and present an O(k2mG+kN)-time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG+kN)-time and O(m)-space algorithm for solving the problem under Hamming distance. |
Databáze: | OpenAIRE |
Externí odkaz: |