Stable feature selection based on instance learning, redundancy elimination and efficient subsets fusion

Autor:	Afef Ben Brahim
Rok vydání:	2020
Předmět:	0209 industrial biotechnology Fusion Training set Filter methods business.industry Computer science Feature selection Pattern recognition 02 engineering and technology Statistical classification 020901 industrial engineering & automation Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Redundancy (engineering) Preprocessor 020201 artificial intelligence & image processing Artificial intelligence business Software Curse of dimensionality
Zdroj:	Neural Computing and Applications. 33:1221-1232
ISSN:	1433-3058 0941-0643
DOI:	10.1007/s00521-020-04971-y
Popis:	Feature selection is frequently used as a preprocessing step to data mining and is attracting growing attention due to the increasing amounts of data emerging from different domains. The large data dimensionality increases the noise and thus the error of learning algorithms. Filter methods for feature selection are specially very fast and useful for high-dimensional datasets. Existing methods focus on producing feature subsets that improve predictive performance, but they often suffer from instability. Instance-based filters, for example, are considered as one of the most effective methods that rank features based on instances neighborhood. However, as the feature weight fluctuates with the instances, small changes in training data result in a different selected subset of features. By another hand, some other filters generate stable results but lead to a modest predictive performance. The absence of a trade-off between stability and classification accuracy decreases the reliability of the feature selection results. In order to deal with this issue, we propose filter methods that improve stability of feature selection while preserving an optimal predictive accuracy and without increasing the complexity of the feature selection algorithms. The proposed approaches first use the strength of instance learning to identify initial sets of relevant features, and the advantage of aggregation techniques to increase the stability of the final set in a second stage. Two classification algorithms are used to evaluate the predictive performance of our proposed instance-based filters compared to state-of-the-art algorithms. The obtained results show the efficiency of our methods in improving both classification accuracy and feature selection stability for high-dimensional datasets.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::d0a0736f60554c579748717e4a0bba3f https://doi.org/10.1007/s00521-020-04971-y Zobrazit plný text záznamu Full text from SpringerLink