Comparison of LZ77-type parsings
Autor: | Dmitry Kosolobov, Arseny M. Shur |
---|---|
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
LZ77 Phrase Computer Science - Information Theory 0102 computer and information sciences 02 engineering and technology Type (model theory) 01 natural sciences Theoretical Computer Science Combinatorics High Energy Physics::Theory COMBINATORIAL PROBLEMS Position (vector) GREEDY PARSING NON-OVERLAPPING PHRASES 0202 electrical engineering electronic engineering information engineering COMPUTER APPLICATIONS Mathematics Series (mathematics) Information Theory (cs.IT) String (computer science) Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) 020206 networking & telecommunications DATA PROCESSING ENCODING (SYMBOLS) Computer Science Applications COMBINATORIAL PROBLEM 010201 computation theory & mathematics Signal Processing Alphabet LOSSLESS DATA COMPRESSION Information Systems |
Zdroj: | Inf. Process. Lett. Information Processing Letters |
ISSN: | 0020-0190 |
DOI: | 10.1016/j.ipl.2018.09.005 |
Popis: | We investigate the relations between different variants of the LZ77 parsing existing in the literature. All of them are defined as greedily constructed parsings encoding each phrase by reference to a string occurring earlier in the input. They differ by the phrase encodings: encoded by pairs (length + position of an earlier occurrence) or by triples (length + position of an earlier occurrence + the letter following the earlier occurring part); and they differ by allowing or not allowing overlaps between the phrase and its earlier occurrence. For a given string of length $n$ over an alphabet of size $\sigma$, denote the numbers of phrases in the parsings allowing (resp., not allowing) overlaps by $z$ (resp., $\hat{z}$) for "pairs", and by $z_3$ (resp., $\hat{z}_3$) for "triples". We prove the following bounds and provide series of examples showing that these bounds are tight: $\bullet$ $z \le \hat{z} \le z \cdot O(\log\frac{n}{z\log_\sigma z})$ and $z_3 \le \hat{z}_3 \le z_3 \cdot O(\log\frac{n}{z_3\log_\sigma z_3})$; $\bullet$ $\frac{1}2\hat{z} < \hat{z}_3 \le \hat{z}$ and $\frac{1}2 z < z_3 \le z$. Comment: 6 pages |
Databáze: | OpenAIRE |
Externí odkaz: |