Probability Model Based on Cluster Analysis to Classify Sequences of Observations for Small Training Sets

Autor: Sergey S. Yulin, Irina N. Palamar
Rok vydání: 2020
Předmět:
Zdroj: Statistics, Optimization & Information Computing. 8:296-303
ISSN: 2310-5070
2311-004X
DOI: 10.19139/soic-2310-5070-690
Popis: The problem of recognizing patterns, when there are few training data available, is particularly relevant and arises in cases when collection of training data is expensive or essentially impossible. The work proposes a new probability model MC&CL (Markov Chain and Clusters) based on a combination of markov chain and algorithm of clustering (self-organizing map of Kohonen, k-means method), to solve a problem of classifying sequences of observations, when the amount of training dataset is low. An original experimental comparison is made between the developed model (MC&CL) and a number of the other popular models to classify sequences: HMM (Hidden Markov Model), HCRF (Hidden Conditional Random Fields),LSTM (Long Short-Term Memory), kNN+DTW (k-Nearest Neighbors algorithm + Dynamic Time Warping algorithm). A comparison is made using synthetic random sequences, generated from the hidden markov model, with noise added to training specimens. The best accuracy of classifying the suggested model is shown, as compared to those under review, when the amount of training data is low.
Databáze: OpenAIRE