Statistical Comparative Analysis and Evaluation of Validation Indices for Clustering Optimization
Autor: | Thy Nguyen, Jason Viehman, Tayo Obafemi-Ajayi, Dacosta Yeboah, Gayla R. Olbricht |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
0206 medical engineering 02 engineering and technology computer.software_genre 020601 biomedical engineering Synthetic data Data set Correlation Set (abstract data type) Identification (information) Range (mathematics) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining Cluster analysis computer Generator (mathematics) |
Zdroj: | SSCI |
Popis: | Clustering is a relevant exploratory tool for a broad range of machine learning applications as it aids identification of meaningful subgroups. For a given clustering algorithm, multiple partitions can be obtained on the same data set by varying algorithmic parameters. Internal validation indices provide a means to objectively evaluate how well groupings obtained from a clustering configuration partitions the data, since there is no prior labeled data. This work presents a rigorous statistical evaluation framework that analyzes performance of internal validation indices based on correlation with external indices. A synthetic data generator that captures a wide range of complexity is proposed. Evaluation is conducted on a varied set of synthetic data types and real data sets to investigate performance of the indices. |
Databáze: | OpenAIRE |
Externí odkaz: |