Random Reducts: A Monte Carlo Rough Set-based Method for Feature Selection in Large Datasets
Autor: | Jan Komorowski, Jakub Mieczkowski, Marcin Kruczyk, Nicholas Baltzer, Michał Dramiński, Jacek Koronacki |
---|---|
Rok vydání: | 2013 |
Předmět: |
Algebra and Number Theory
business.industry Monte Carlo method Feature selection Pattern recognition computer.software_genre Synthetic data Theoretical Computer Science Computational Theory and Mathematics Discriminative model Full data Rough set Artificial intelligence Data mining business Classifier (UML) computer Information Systems Mathematics |
Zdroj: | Fundamenta Informaticae. 127:273-288 |
ISSN: | 0169-2968 |
DOI: | 10.3233/fi-2013-909 |
Popis: | An important step prior to constructing a classifier for a very large data set is feature selection. With many problems it is possible to find a subset of attributes that have the same discriminative power as the full data set. There are many feature selection methods but in none of them are Rough Set models tied up with statistical argumentation. Moreover, known methods of feature selection usually discard shadowed features, i.e. those carrying the same or partially the same information as the selected features. In this study we present Random Reducts RR-a feature selection method which precedes classification per se. The method is based on the Monte Carlo Feature Selection MCFS layout and uses Rough Set Theory in the feature selection process. On synthetic data, we demonstrate that the method is able to select otherwise shadowed features of which the user should be made aware, and to find interactions in the data set. |
Databáze: | OpenAIRE |
Externí odkaz: |