GB-AFS: graph-based automatic feature selection for multi-class classification via Mean Simplified Silhouette

Autor: David Levin, Gonen Singer
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Journal of Big Data, Vol 11, Iss 1, Pp 1-22 (2024)
Druh dokumentu: article
ISSN: 2196-1115
DOI: 10.1186/s40537-024-00934-5
Popis: Abstract This paper introduces a novel graph-based filter method for automatic feature selection (abbreviated as GB-AFS) for multi-class classification tasks. The method determines the minimum combination of features required to sustain prediction performance while maintaining complementary discriminating abilities between different classes. It does not require any user-defined parameters such as the number of features to select. The minimum number of features is selected using our newly developed Mean Simplified Silhouette (abbreviated as MSS) index, designed to evaluate the clustering results for the feature selection task. To illustrate the effectiveness and generality of the method, we applied the GB-AFS method using various combinations of statistical measures and dimensionality reduction techniques. The experimental results demonstrate the superior performance of the proposed GB-AFS over other filter-based techniques and automatic feature selection approaches, and demonstrate that the GB-AFS method is independent of the statistical measure or the dimensionality reduction technique chosen by the user. Moreover, the proposed method maintained the accuracy achieved when utilizing all features while using only 7– $$30\%$$ 30 % of the original features. This resulted in an average time saving ranging from $$15\%$$ 15 % for the smallest dataset to $$70\%$$ 70 % for the largest. Our code is available at https://github.com/davidlevinwork/gbfs/ .
Databáze: Directory of Open Access Journals