On the Impact of Morphisms on BWT-Runs
Autor: | Fici, Gabriele, Romana, Giuseppe, Sciortino, Marinella, Urbina, Cristian |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2023 |
Předmět: | |
DOI: | 10.4230/lipics.cpm.2023.10 |
Popis: | Morphisms are widely studied combinatorial objects that can be used for generating infinite families of words. In the context of Information theory, injective morphisms are called (variable length) codes. In Data compression, the morphisms, combined with parsing techniques, have been recently used to define new mechanisms to generate repetitive words. Here, we show that the repetitiveness induced by applying a morphism to a word can be captured by a compression scheme based on the Burrows-Wheeler Transform (BWT). In fact, we prove that, differently from other compression-based repetitiveness measures, the measure r_bwt (which counts the number of equal-letter runs produced by applying BWT to a word) strongly depends on the applied morphism. More in detail, we characterize the binary morphisms that preserve the value of r_bwt(w), when applied to any binary word w containing both letters. They are precisely the Sturmian morphisms, which are well-known objects in Combinatorics on words. Moreover, we prove that it is always possible to find a binary morphism that, when applied to any binary word containing both letters, increases the number of BWT-equal letter runs by a given (even) number. In addition, we derive a method for constructing arbitrarily large families of binary words on which BWT produces a given (even) number of new equal-letter runs. Such results are obtained by using a new class of morphisms that we call Thue-Morse-like. Finally, we show that there exist binary morphisms μ for which it is possible to find words w such that the difference r_bwt(μ(w))-r_bwt(w) is arbitrarily large. LIPIcs, Vol. 259, 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023), pages 10:1-10:18 |
Databáze: | OpenAIRE |
Externí odkaz: |