Improving Marlin's compression ratio with partially overlapping codewords

Autor: Manuel Martinez, Kai Sandfort, Joan Serra-Sagrista, Danny Dube
Rok vydání: 2021
Předmět:
Zdroj: Recercat. Dipósit de la Recerca de Catalunya
instname
DCC
Recercat: Dipósit de la Recerca de Catalunya
Varias* (Consorci de Biblioteques Universitáries de Catalunya, Centre de Serveis Científics i Acadèmics de Catalunya)
Popis: Marlin [1] is a Variable-to-Fixed (VF) codec optimized for decoding speed. To achieve its speed, Marlin does not encode the current state of the input source, penalyzing compression ratio. In this paper we address this penalty by partially encoding the current state of the input in the lower bits of the codeword. Those bits select which chapter in the dictionary must be used to decode the next codeword. Each chapter is specialized for a subset of states, improving compression ratio. At the same time, we use one victim chapter to encode all rare symbols, increasing the efficiency of the rest of them. The decoding algorithm remains the same, only now codewords have overlapping bits. Mapping techniques allow us to combine common chapters and thus keep an efficient use of the L1 cache. We evaluate our approach with both synthetic and real data sets, and show significant improvements in low entropy sources, where compression efficiency can improve from 93.9% to 98.6%.
Databáze: OpenAIRE