Generation of Gaussian sets for clustering methods assessment

Autor: Nacéra Benamrane, Mohammed Ouali, Radhwane Gherbaoui
Rok vydání: 2021
Předmět:
Zdroj: Data & Knowledge Engineering. :101876
ISSN: 0169-023X
DOI: 10.1016/j.datak.2021.101876
Popis: Clustering methods are generally used to study the homogeneity in a set of observations. The results obtained from the clustering process differ from one method to another, to the extent that the same method or validity index gives different outcomes depending on the initial parameters. Analytical evaluation appears to be insufficient for studying the behavior of clustering methods due to its ad hoc nature. Even if the real data set is used in evaluating clustering methods, artificial data is fundamental for assessing the performance since it allows creating different scenarios of test with known structures. The main drawback of existing methods of artificial data is that they do not take into consideration the problem of sensitivity to the size of clusters. In this paper, we propose an automatic method: the high-dimensional artificial Gaussian mixture generator. By formally quantifying the overlap, the generator preserves the notion of the overlap rate between the mixture components. The advantages of this generator are its use of the notion of overlap rate, the unlimited number of mixture components, high-dimensionality of the observations, and the non-utilization of visual inspection as a criterion to quantify the overlap. In addition, we evaluate the k-means, fuzzy c-means (FCM), FCM-based splitting algorithm (FBSA), and expectation maximization (EM) in different dimensions. The results obtained confirm previous work and reveal new findings that are not pointed out when using 1D and 2D artificial data. 1
Databáze: OpenAIRE