Sampling Based Approximate Spectral Clustering Ensemble For Partitioning Datasets
Autor: | Yaser Moazzen, Kadim Tasdemir |
---|---|
Rok vydání: | 2016 |
Předmět: |
Fuzzy clustering
business.industry Computation Quantization (signal processing) Correlation clustering 0211 other engineering and technologies Pattern recognition 02 engineering and technology computer.software_genre Ensemble learning Spectral clustering ComputingMethodologies_PATTERNRECOGNITION Parametric model 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence Data mining business Cluster analysis computer 021101 geological & geomatics engineering Mathematics |
Zdroj: | ICPR |
Popis: | Spectral clustering is able to extract clusters with various characteristics without a parametric model, however it is infeasible for large datasets due to its high computational cost and memory requirement. Approximate spectral clustering (ASC) addresses this challenge by a representative-based partitioning approach which first finds a set of data representatives either by sampling or quantization, then applies spectral clustering on them. To achieve an optimal partitioning with ASC, several sampling or quantization methods together with advanced similarity criteria have been recently proposed. While quantization is more accurate than sampling in expense of heavy computation, and geodesic based hybrid similarity criteria are often more informative than others, there is no unique solution optimum for all datasets. Alternatively, we propose to use ensemble learning to produce a consensus partitioning constructed from different set of representatives and similarity criteria. The proposed ensemble (SASCE) not only produces a relatively more accurate partitioning but also eliminates the need to determine the best pair (the optimum set of representatives and the optimum similarity). Thanks to the efficient similarity definition on the representative level, the SASCE can be powerful for clustering small and medium datasets, outperforming traditional clustering approaches and their ensembles. |
Databáze: | OpenAIRE |
Externí odkaz: |