Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

Autor:	Kunihiko Sadakane
Rok vydání:	2000
Předmět:	Compressed suffix array Database Computer science Entropy (information theory) Inverted index Data structure computer.software_genre computer Algorithm Substring Data compression
Zdroj:	Algorithms and Computation ISBN: 9783540412557 ISAAC
DOI:	10.1007/3-540-40996-3_35
Popis:	A compressed text database based on the compressed suffix array is proposed. The compressed suffix array of Grossi and Vitter occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies O(n log \|Σ\|) bits for the alphabet Σ. On the other hand, our data structure does not use the text itself, and supports important operations for text databases: inverse, search and decompress. Our algorithms can find occ occurrences of any substring P of the text in O(\|P\| log n + occ logƐ n) time and decompress a part of the text of length l in O(l + logƐ n) time for any given 1 ≥ Ɛ > 0. Our data structure occupies only n(2/Ɛ (3/2 + H0 + 2 log H0) + 2 + 4 logƐ n/logƐ n-1)+o(n)+O(\|Σ\| log \|Σ\|) bits where H0 ≤ log \|Σ\| is the order-0 entropy of the text. We also show the relationship with the opportunistic data structure of Ferragina and Manzini.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::f9c3618a0dd3f793a68b6b3eb95f5e67 https://doi.org/10.1007/3-540-40996-3_35 Zobrazit plný text záznamu