L-Systems for Measuring Repetitiveness
Autor: | Navarro, Gonzalo, Urbina, Cristian |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2023 |
Předmět: |
FOS: Computer and information sciences
L-systems Formal Languages and Automata Theory (cs.FL) Text compression Mathematics of computing → Combinatorics on words Computer Science - Data Structures and Algorithms Repetitiveness measures Data Structures and Algorithms (cs.DS) Computer Science - Formal Languages and Automata Theory String morphisms Theory of computation → Data compression |
DOI: | 10.4230/lipics.cpm.2023.25 |
Popis: | In order to use them for compression, we extend L-systems (without ε-rules) with two parameters d and n, and also a coding τ, which determines unambiguously a string w = τ(φ^d(s))[1:n], where φ is the morphism of the system, and s is its axiom. The length of the shortest description of an L-system generating w is known as 𝓁, and it is arguably a relevant measure of repetitiveness that builds on the self-similarities that arise in the sequence. In this paper, we deepen the study of the measure 𝓁 and its relation with a better-established measure called δ, which builds on substring complexity. Our results show that 𝓁 and δ are largely orthogonal, in the sense that one can be much larger than the other, depending on the case. This suggests that both mechanisms capture different kinds of regularities related to repetitiveness. We then show that the recently introduced NU-systems, which combine the capabilities of L-systems with bidirectional macro schemes, can be asymptotically strictly smaller than both mechanisms for the same fixed string family, which makes the size ν of the smallest NU-system the unique smallest reachable repetitiveness measure to date. We conclude that in order to achieve better compression, we should combine morphism substitution with copy-paste mechanisms. LIPIcs, Vol. 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023), pages 25:1-25:17 |
Databáze: | OpenAIRE |
Externí odkaz: |