Minimal Absent Words in a Sliding Window and Applications to On-Line Pattern Matching
Autor: | Solon P. Pissis, Yann Ramusat, Maxime Crochemore, Alice Héliou, Gregory Kucherov, Laurent Mouchard |
---|---|
Přispěvatelé: | Klasing, Ralf, Zeitoun, Marc, Informatics, King‘s College London, Laboratoire d'informatique de l'École polytechnique [Palaiseau] (LIX), Centre National de la Recherche Scientifique (CNRS)-École polytechnique (X), Laboratoire d'Informatique Gaspard-Monge (LIGM), Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM), Equipe Traitement de l'information en Biologie Santé (TIBS - LITIS), Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS), Université Le Havre Normandie (ULH), Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université Le Havre Normandie (ULH), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA), Department of Informatics [King's College London], Université Paris sciences et lettres (PSL), École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS), Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS) |
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] Sigma 0102 computer and information sciences 01 natural sciences Running time Combinatorics 03 medical and health sciences 030104 developmental biology 010201 computation theory & mathematics Sliding window protocol Sequence comparison Line (geometry) Metric (mathematics) Pattern matching Algorithm Word (group theory) Mathematics |
Zdroj: | Fundamentals of Computation Theory ISBN: 9783662557501 FCT Crochemore, M, Heliou, A, Kucherov, G, Mouchard, L, Pissis, S P & Ramusat, Y 2017, Minimal Absent Words in a Sliding Window and Applications to Pattern Matching . in R Klasing & M Zeitoun (eds), Fundamentals of Computation Theory: 21st International Symposium, FCT 2017, Bordeaux, France, September 11--13, 2017, Proceedings . vol. 10472, Lecture Notes in Computer Science, Berlin, Heidelberg, pp. 164-176 . https://doi.org/10.1007/978-3-662-55751-8_14 Fundamentals of Computation Theory Fundamentals of Computation Theory, Sep 2017, Bordeaux, France. pp.164-176, ⟨10.1007/978-3-662-55751-8_14⟩ |
DOI: | 10.1007/978-3-662-55751-8_14 |
Popis: | An absent (or forbidden) word of a word y is a word that does not occur in y. It is then called minimal if all its proper factors occur in y. There exist linear-time and linear-space algorithms for computing all minimal absent words of y (Crochemore et al. in Inf Process Lett 67:111–117, 1998; Belazzougui et al. in ESA 8125:133–144, 2013; Barton et al. in BMC Bioinform 15:388, 2014). Minimal absent words are used for data compression (Crochemore et al. in Proc IEEE 88:1756–1768, 2000, Ota and Morita in Theoret Comput Sci 526:108–119, 2014) and for alignment-free sequence comparison by utilizing a metric based on minimal absent words (Chairungsee and Crochemore in Theoret Comput Sci 450:109–116, 2012). They are also used in molecular biology; for instance, three minimal absent words of the human genome were found to play a functional role in a coding region in Ebola virus genomes (Silva et al. in Bioinformatics 31:2421–2425, 2015). In this article we introduce a new application of minimal absent words for on-line pattern matching. Specifically, we present an algorithm that, given a pattern x and a text y, computes the distance between x and every window of size |x| on y. The running time is \(\mathcal {O}(\sigma |y|)\), where \(\sigma \) is the size of the alphabet. Along the way, we show an \(\mathcal {O}(\sigma |y|)\)-time and \(\mathcal {O}(\sigma |x|)\)-space algorithm to compute the minimal absent words of every window of size |x| on y, together with some new combinatorial insight on minimal absent words. |
Databáze: | OpenAIRE |
Externí odkaz: |