Generation of Gaussian sets for clustering methods assessment

Autor:	Nacéra Benamrane, Mohammed Ouali, Radhwane Gherbaoui
Rok vydání:	2021
Předmět:	Information Systems and Management business.industry Computer science Gaussian Pattern recognition 02 engineering and technology 01 natural sciences Fuzzy logic Data set Set (abstract data type) 010104 statistics & probability symbols.namesake Expectation–maximization algorithm 0202 electrical engineering electronic engineering information engineering symbols 020201 artificial intelligence & image processing Sensitivity (control systems) Artificial intelligence 0101 mathematics business Cluster analysis Generator (mathematics)
Zdroj:	Data & Knowledge Engineering. :101876
ISSN:	0169-023X
DOI:	10.1016/j.datak.2021.101876
Popis:	Clustering methods are generally used to study the homogeneity in a set of observations. The results obtained from the clustering process differ from one method to another, to the extent that the same method or validity index gives different outcomes depending on the initial parameters. Analytical evaluation appears to be insufficient for studying the behavior of clustering methods due to its ad hoc nature. Even if the real data set is used in evaluating clustering methods, artificial data is fundamental for assessing the performance since it allows creating different scenarios of test with known structures. The main drawback of existing methods of artificial data is that they do not take into consideration the problem of sensitivity to the size of clusters. In this paper, we propose an automatic method: the high-dimensional artificial Gaussian mixture generator. By formally quantifying the overlap, the generator preserves the notion of the overlap rate between the mixture components. The advantages of this generator are its use of the notion of overlap rate, the unlimited number of mixture components, high-dimensionality of the observations, and the non-utilization of visual inspection as a criterion to quantify the overlap. In addition, we evaluate the k-means, fuzzy c-means (FCM), FCM-based splitting algorithm (FBSA), and expectation maximization (EM) in different dimensions. The results obtained confirm previous work and reveal new findings that are not pointed out when using 1D and 2D artificial data. 1
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::d53dacb71e5ec973d029b9fe06103dfa https://doi.org/10.1016/j.datak.2021.101876 Zobrazit plný text záznamu Full Text from ScienceDirect