Practical and flexible pattern matching over Ziv–Lempel compressed text
Autor: | Gonzalo Navarro, Mathieu Raffinot |
---|---|
Rok vydání: | 2004 |
Předmět: |
Compressed suffix array
Theoretical computer science LZ77 Computer science LZW Commentz-Walter algorithm String searching algorithm Data_CODINGANDINFORMATIONTHEORY Approximate string matching LZ78 Ziv–Lempel compression law.invention Theoretical Computer Science Computational Theory and Mathematics Search algorithm law Compressed pattern matching Discrete Mathematics and Combinatorics Pattern matching Boyer–Moore string search algorithm Bit-parallelism |
Zdroj: | Journal of Discrete Algorithms. 2(3):347-371 |
ISSN: | 1570-8667 |
DOI: | 10.1016/j.jda.2003.12.002 |
Popis: | We address the problem of string matching on Ziv–Lempel compressed text. The goal is to search for a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text databases where efficient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts the essential features of Ziv–Lempel compression. We then apply the scheme to each particular type of compression. We present an algorithm to find all the matches of a pattern in a text compressed using LZ77. When we apply our scheme to LZ78, we obtain a much more efficient search algorithm, which is faster than uncompressing the text and then searching it. Finally, we propose a new hybrid compression scheme which is between LZ77 and LZ78, being in practice as good to compress as LZ77 and as fast to search as LZ78. We show also how to search for some extended patterns on Ziv–Lempel compressed text, such as classes of characters and approximate string matching. |
Databáze: | OpenAIRE |
Externí odkaz: |