Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array
Autor: | Kunihiko Sadakane |
---|---|
Rok vydání: | 2000 |
Předmět: | |
Zdroj: | Algorithms and Computation ISBN: 9783540412557 ISAAC |
DOI: | 10.1007/3-540-40996-3_35 |
Popis: | A compressed text database based on the compressed suffix array is proposed. The compressed suffix array of Grossi and Vitter occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies O(n log |Σ|) bits for the alphabet Σ. On the other hand, our data structure does not use the text itself, and supports important operations for text databases: inverse, search and decompress. Our algorithms can find occ occurrences of any substring P of the text in O(|P| log n + occ logƐ n) time and decompress a part of the text of length l in O(l + logƐ n) time for any given 1 ≥ Ɛ > 0. Our data structure occupies only n(2/Ɛ (3/2 + H0 + 2 log H0) + 2 + 4 logƐ n/logƐ n-1)+o(n)+O(|Σ| log |Σ|) bits where H0 ≤ log |Σ| is the order-0 entropy of the text. We also show the relationship with the opportunistic data structure of Ferragina and Manzini. |
Databáze: | OpenAIRE |
Externí odkaz: |