Pattern matching on elastic-degenerate text with errors
Autor: | Giulia Bernardini, Giovanna Rosone, Nadia Pisanti, Solon P. Pissis |
---|---|
Přispěvatelé: | Bernardini, G, Pisanti, N, Pissis, Sp, Rosone, G, Dipartimento di Matematica [Pisa], University of Pisa - Università di Pisa, Dipartimento di Informatica [Pisa], Equipe de recherche européenne en algorithmique et biologie formelle et expérimentale (ERABLE), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Department of Informatics [King's College London], King‘s College London, Pissis, S |
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
Uncertain sequences Elastic-degenerate strings Degenerate strings Pan-genome Pattern matching Elastic-degenerate string indeterminate strings Computer science [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] 0102 computer and information sciences String processing Pan-genome 01 natural sciences 03 medical and health sciences Degenerate string Pattern matching elastic-degenerate strings pan-genome indeterminate strings degenerate strings degenerate strings Information retrieval Spire (mollusc) INF/01 - INFORMATICA Uncertain sequences Elastic-degenerate strings Degenerate strings Uncertain sequence 030104 developmental biology pattern matching 010201 computation theory & mathematics pan-genome [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM] elastic-degenerate strings uncertain strings |
Zdroj: | String Processing and Information Retrieval ISBN: 9783319674278 LNCS SPIRE 2017-24th International Symposium on String Processing and Information Retrieval SPIRE 2017-24th International Symposium on String Processing and Information Retrieval, Sep 2017, Palermo, Italy. pp.74-90, ⟨10.1007/978-3-319-67428-5_7⟩ |
DOI: | 10.1007/978-3-319-67428-5_7 |
Popis: | International audience; An elastic-degenerate string is a sequence of n sets of strings of total length N. It has been introduced to represent a multiple alignment of several closely-related sequences (e.g. pan-genome) compactly. In this representation, substrings of these sequences that match exactly are collapsed, while in positions where the sequences differ, all possible variants observed at that location are listed. The natural problem that arises is finding all matches of a deterministic pattern of length m in an elastic-degenerate text. There exists an O(nm 2 + N)-time algorithm to solve this problem on-line after a pre-processing stage with time and space O(m). In this paper, we study the same problem under the edit distance model and present an O(k 2 mG + kN)-time and O(m)-space algorithm, where G is the total number of strings in the elastic-degenerate text and k is the maximum edit distance allowed. We also present a simple O(kmG + kN)-time and O(m)-space algorithm for Hamming distance. |
Databáze: | OpenAIRE |
Externí odkaz: |