Extended-Alphabet Finite-Context Models

Autor: Carvalho, João M., Brás, Susana, Pratas, Diogo, Ferreira, Jacqueline, Soares, Sandra C., Pinho, Armando J.
Rok vydání: 2017
Předmět:
Druh dokumentu: Working Paper
DOI: 10.1016/j.patrec.2018.05.026
Popis: The Normalized Relative Compression (NRC) is a recent dissimilarity measure, related to the Kolmogorov Complexity. It has been successfully used in different applications, like DNA sequences, images or even ECG (electrocardiographic) signal. It uses a compressor that compresses a target string using exclusively the information contained in a reference string. One possible approach is to use finite-context models (FCMs) to represent the strings. A finite-context model calculates the probability distribution of the next symbol, given the previous $k$ symbols. In this paper, we introduce a generalization of the FCMs, called extended-alphabet finite-context models (xaFCM), that calculates the probability of occurrence of the next $d$ symbols, given the previous $k$ symbols. We perform experiments on two different sample applications using the xaFCMs and the NRC measure: ECG biometric identification, using a publicly available database; estimation of the similarity between DNA sequences of two different, but related, species -- chromosome by chromosome. In both applications, we compare the results against those obtained by the FCMs. The results show that the xaFCMs use less memory and computational time to achieve the same or, in some cases, even more accurate results.
Databáze: arXiv