Minimal Absent Words in a Sliding Window and Applications to On-Line Pattern Matching

Autor: Solon P. Pissis, Yann Ramusat, Maxime Crochemore, Alice Héliou, Gregory Kucherov, Laurent Mouchard
Přispěvatelé: Klasing, Ralf, Zeitoun, Marc, Informatics, King‘s College London, Laboratoire d'informatique de l'École polytechnique [Palaiseau] (LIX), Centre National de la Recherche Scientifique (CNRS)-École polytechnique (X), Laboratoire d'Informatique Gaspard-Monge (LIGM), Centre National de la Recherche Scientifique (CNRS)-Fédération de Recherche Bézout-ESIEE Paris-École des Ponts ParisTech (ENPC)-Université Paris-Est Marne-la-Vallée (UPEM), Equipe Traitement de l'information en Biologie Santé (TIBS - LITIS), Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS), Université Le Havre Normandie (ULH), Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université Le Havre Normandie (ULH), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA), Department of Informatics [King's College London], Université Paris sciences et lettres (PSL), École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS), Université Paris-Est Marne-la-Vallée (UPEM)-École des Ponts ParisTech (ENPC)-ESIEE Paris-Fédération de Recherche Bézout-Centre National de la Recherche Scientifique (CNRS)
Rok vydání: 2017
Předmět:
Zdroj: Fundamentals of Computation Theory ISBN: 9783662557501
FCT
Crochemore, M, Heliou, A, Kucherov, G, Mouchard, L, Pissis, S P & Ramusat, Y 2017, Minimal Absent Words in a Sliding Window and Applications to
Pattern Matching . in R Klasing & M Zeitoun (eds), Fundamentals of Computation Theory: 21st International Symposium, FCT 2017, Bordeaux, France, September 11--13, 2017, Proceedings . vol. 10472, Lecture Notes in Computer Science, Berlin, Heidelberg, pp. 164-176 . https://doi.org/10.1007/978-3-662-55751-8_14
Fundamentals of Computation Theory
Fundamentals of Computation Theory, Sep 2017, Bordeaux, France. pp.164-176, ⟨10.1007/978-3-662-55751-8_14⟩
DOI: 10.1007/978-3-662-55751-8_14
Popis: An absent (or forbidden) word of a word y is a word that does not occur in y. It is then called minimal if all its proper factors occur in y. There exist linear-time and linear-space algorithms for computing all minimal absent words of y (Crochemore et al. in Inf Process Lett 67:111–117, 1998; Belazzougui et al. in ESA 8125:133–144, 2013; Barton et al. in BMC Bioinform 15:388, 2014). Minimal absent words are used for data compression (Crochemore et al. in Proc IEEE 88:1756–1768, 2000, Ota and Morita in Theoret Comput Sci 526:108–119, 2014) and for alignment-free sequence comparison by utilizing a metric based on minimal absent words (Chairungsee and Crochemore in Theoret Comput Sci 450:109–116, 2012). They are also used in molecular biology; for instance, three minimal absent words of the human genome were found to play a functional role in a coding region in Ebola virus genomes (Silva et al. in Bioinformatics 31:2421–2425, 2015). In this article we introduce a new application of minimal absent words for on-line pattern matching. Specifically, we present an algorithm that, given a pattern x and a text y, computes the distance between x and every window of size |x| on y. The running time is \(\mathcal {O}(\sigma |y|)\), where \(\sigma \) is the size of the alphabet. Along the way, we show an \(\mathcal {O}(\sigma |y|)\)-time and \(\mathcal {O}(\sigma |x|)\)-space algorithm to compute the minimal absent words of every window of size |x| on y, together with some new combinatorial insight on minimal absent words.
Databáze: OpenAIRE