Classification of Periodic, Chaotic and Random Sequences using NSRPS Complexity Measure

Autor: Balasubramanian, Karthi, Prabhu, Gayathri R., K., Lakshmipriya V., Krishnan, Maneesha, R., Praveena, Nagaraj, Nithin
Rok vydání: 2012
Předmět:
Druh dokumentu: Working Paper
Popis: Data compression algorithms are generally perceived as being of interest for data communication and storage purposes only. However, their use in the field of data classification and analysis is also of equal importance. Automatic data classification and analysis finds use in varied fields like bioinformatics, language and sequence recognition and authorship attribution. Different complexity measures proposed in literature like Shannon entropy, Relative entropy, Kolmogrov and Algorithmic complexity have drawbacks that make these methods ineffective in analyzing short sequences that are typical in population dynamics and other fields. In this paper, we study Non-Sequential Recursive Pair Substitution (NSRPS), a lossless compression algorithm first proposed by Ebeling {\it et al.} [Math. Biosc. 52, 1980] and Jim\'{e}nez-Monta\~{n}o {\it et al.} [arXiv:cond-mat/0204134, 2002]). Using this algorithm, a new complexity measure was recently proposed (Nagaraj {\it et al.} [arXiv:nlin.CD/1101.4341v1, 2011]). In this work, we use NSRPS complexity measure for analyzing and classifying symbolic sequences generated by 1D chaotic dynamical systems. Even with learning data-sets of length as small as 25 and test data-sets of length as small as 10, NSRPS measure is able to accurately classify the test sequence as periodic, chaotic or random. For such short data lengths, methods which use entropy measure and traditional lossless compression algorithm like LZ77 [A.Lempel and J.Ziv, IEEE Trans. Inform. Theory {\bf 22}, 75 (1976)] (used for instance by {\it Gzip}, {\it Winzip} etc.) fails.
Comment: 4 pages, 4 tables, accepted for oral presentation at National Conference on Nonlinear Systems and Dynamics, IISER Pune, July 2012
Databáze: arXiv