Hullingversus Clustering - Two Complementary Applications of Non-Negative Matrix Factorization

Autor: Slawomir T. Wierzchon, Mieczyslaw A. Klopotek
Rok vydání: 2021
Předmět:
Zdroj: CEC
Popis: In this paper we make a comparison of two NMF based techniques of dataset characterization: clustering and hulling. The characteristics of a dataset should be understood as describing the content of a data set through several characteristic representatives. Hulling (defined later) characterizes the data by saying that the data points are somewhere between the representatives, while clustering characterizes the data by saying that the data points are close to one or the other representative. The precision of such a characteristic will be measured as a deviation from the idea of characterization, i.e. the distance of the actual data points from the closest representatives in the case of clustering and from the interior of the hull spanned by the representatives. We show that for low-dimensional data the hull-based characterization precision is much better than in case of clustering. Clustering and hulling are two examples of sophisticated optimization problems. Evolutionary algorithms are an excellent tool for solving such problems. However, in the case of large, high-dimensional data sets, their usefulness decreases. In this paper, we discuss heuristics for hulling for massive data. We hope that it will inspire the creation of an effective evolutionary algorithm dedicated to solving such problems.
Databáze: OpenAIRE