Computing NP-hard Repetitiveness Measures via MAX-SAT

Autor: Bannai, Hideo, Goto, Keisuke, Ishihata, Masakazu, Kanda, Shunsuke, Köppl, Dominik, Nishimoto, Takaaki
Rok vydání: 2022
Předmět:
Druh dokumentu: Working Paper
Popis: Repetitiveness measures reveal profound characteristics of datasets, and give rise to compressed data structures and algorithms working in compressed space. Alas, the computation of some of these measures is NP-hard, and straight-forward computation is infeasible for datasets of even small sizes. Three such measures are the smallest size of a string attractor, the smallest size of a bidirectional macro scheme, and the smallest size of a straight-line program. While a vast variety of implementations for heuristically computing approximations exist, exact computation of these measures has received little to no attention. In this paper, we present MAX-SAT formulations that provide the first non-trivial implementations for exact computation of smallest string attractors, smallest bidirectional macro schemes, and smallest straight-line programs. Computational experiments show that our implementations work for texts of length up to a few hundred for straight-line programs and bidirectional macro schemes, and texts even over a million for string attractors.
Comment: paper accepted to ESA 2022 (plus Appendix); corrected attribution of Python program for computing https://oeis.org/A339391
Databáze: arXiv