On optimal parsing for LZ78-like compressors
Autor: | Salvatore Aronica, Francesca Marzi, Filippo Mignosi, Salvatore Mazzola, Alessio Langiu |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
String algorithms
Phrase General Computer Science Parsing algorithms Computer science 0102 computer and information sciences 02 engineering and technology Data_CODINGANDINFORMATIONTHEORY computer.software_genre 01 natural sciences Lempel-Ziv compression algorithms Text compression Text entropy Text factorisation Theoretical Computer Science Computer Science (all) 0202 electrical engineering electronic engineering information engineering Entropy (information theory) Parsing 020206 networking & telecommunications Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) 010201 computation theory & mathematics Bounded function computer Algorithm |
Zdroj: | Theoretical computer science 710 (2018): 19–28. doi:10.1016/j.tcs.2017.02.019 info:cnr-pdr/source/autori:Salvatore Aronica Alessio Langiu Francesca Marzi Salvatore Mazzola Filippo Mignosi/titolo:On optimal parsing for LZ78-like compressors/doi:10.1016%2Fj.tcs.2017.02.019/rivista:Theoretical computer science/anno:2018/pagina_da:19/pagina_a:28/intervallo_pagine:19–28/volume:710 |
DOI: | 10.1016/j.tcs.2017.02.019 |
Popis: | Flexible parsing algorithm, a two-steps-greedy parsing algorithm for text factorisation, has been proved to be an optimal parsing for LZ78-like compressors in the case of constant-cost phrases [1] , [2] . Whilst in early implementations of LZ78-like compressors the phrases have constant cost, in common modern implementations the cost of the k-th phrase is ⌈ log 2 k + C ⌉ where C is a real constant [3] , [4] . Indeed we show examples where Flexible parsing is not optimal under the above more realistic setting. In this paper we prove that, under the assumption that the cost of a phrase is block-wise constant and non-decreasing, the Flexible parsing is almost optimal. For almost optimal we mean that, for any text T, the difference between the sizes of the compressed text obtained by using a Flexible parsing and an optimal parsing is bounded by the maximal cost of a phrase in T, i.e. it is logarithmic in practical cases. Furthermore we investigate how an optimal parsing, and hence an almost optimal parsing, affects the rate of convergence to the entropy of LZ78-like compressors. We discuss some experimental results considering the ratio between the speed of convergence to the entropy of compressors with and without an optimal parsing. This ratio presents a kind of wave effect that increases as the entropy of a memoryless source decreases but it seems always to slowly converge to one. According to the theory, this wave can be a tsunami for some families of highly compressible strings and, although the optimal (and the almost optimal) parsing does not improve the asymptotical speed of convergence to the entropy, it can improve compression ratio, and hence the decoding speed, in many practical cases. |
Databáze: | OpenAIRE |
Externí odkaz: |