Random Reducts: A Monte Carlo Rough Set-based Method for Feature Selection in Large Datasets

Autor: Jan Komorowski, Jakub Mieczkowski, Marcin Kruczyk, Nicholas Baltzer, Michał Dramiński, Jacek Koronacki
Rok vydání: 2013
Předmět:
Zdroj: Fundamenta Informaticae. 127:273-288
ISSN: 0169-2968
DOI: 10.3233/fi-2013-909
Popis: An important step prior to constructing a classifier for a very large data set is feature selection. With many problems it is possible to find a subset of attributes that have the same discriminative power as the full data set. There are many feature selection methods but in none of them are Rough Set models tied up with statistical argumentation. Moreover, known methods of feature selection usually discard shadowed features, i.e. those carrying the same or partially the same information as the selected features. In this study we present Random Reducts RR-a feature selection method which precedes classification per se. The method is based on the Monte Carlo Feature Selection MCFS layout and uses Rough Set Theory in the feature selection process. On synthetic data, we demonstrate that the method is able to select otherwise shadowed features of which the user should be made aware, and to find interactions in the data set.
Databáze: OpenAIRE