Comparative study of feature selection with ensemble learning using SOM variants
Autor: | Najet Arous, Chiraz Jlassi, Ameni Filali |
---|---|
Rok vydání: | 2017 |
Předmět: |
Computer science
business.industry Dimensionality reduction Pattern recognition Feature selection 02 engineering and technology computer.software_genre 01 natural sciences Ensemble learning Partition (database) Random forest Visualization ComputingMethodologies_PATTERNRECOGNITION 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Unsupervised learning 020201 artificial intelligence & image processing Artificial intelligence Data mining 010306 general physics business Cluster analysis computer |
Zdroj: | ICMV |
ISSN: | 0277-786X |
DOI: | 10.1117/12.2268538 |
Popis: | Ensemble learning has succeeded in the growth of stability and clustering accuracy, but their runtime prohibits them from scaling up to real-world applications. This study deals the problem of selecting a subset of the most pertinent features for every cluster from a dataset. The proposed method is another extension of the Random Forests approach using self-organizing maps (SOM) variants to unlabeled data that estimates the out-of-bag feature importance from a set of partitions. Every partition is created using a various bootstrap sample and a random subset of the features. Then, we show that the process internal estimates are used to measure variable pertinence in Random Forests are also applicable to feature selection in unsupervised learning. This approach aims to the dimensionality reduction, visualization and cluster characterization at the same time. Hence, we provide empirical results on nineteen benchmark data sets indicating that RFS can lead to significant improvement in terms of clustering accuracy, over several state-of-the-art unsupervised methods, with a very limited subset of features. The approach proves promise to treat with very broad domains. |
Databáze: | OpenAIRE |
Externí odkaz: |