Estimating and Controlling Overlap in Gaussian Mixtures for Clustering Methods Evaluation
Autor: | Radhwane Gherbaoui, Nacéra Benamrane, Mohammed Ouali |
---|---|
Rok vydání: | 2020 |
Předmět: |
Clustering high-dimensional data
Thesaurus (information retrieval) Computer science Gaussian 02 engineering and technology computer.software_genre 01 natural sciences 010104 statistics & probability Search engine symbols.namesake Artificial Intelligence Control and Systems Engineering Simulated data 0202 electrical engineering electronic engineering information engineering symbols 020201 artificial intelligence & image processing Data mining 0101 mathematics Cluster analysis computer Software Information Systems |
Zdroj: | International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 28:183-211 |
ISSN: | 1793-6411 0218-4885 |
DOI: | 10.1142/s0218488520500087 |
Popis: | The ad hoc nature of the clustering methods makes simulated data paramount in assessing the performance of clustering methods. Real datasets could be used in the evaluation of clustering methods with the major drawback of missing the assessment of many test scenarios. In this paper, we propose a formal quantification of component overlap. This quantification is derived from a set of theorems which allow us to derive an automatic method for artificial data generation. We also derive a method to estimate parameters of existing models and to evaluate the results of other approaches. Automatic estimation of the overlap rate can also be used as an unsupervised learning approach in data mining to determine the parameters of mixture models from actual observations. |
Databáze: | OpenAIRE |
Externí odkaz: |