Improving nature-inspired algorithms for feature selection

Autor:	Zakariya Yahya Algamal, Omar Saber Qasim, Niam Abdulmunim Al-Thanoon
Rok vydání:	2021
Předmět:	0209 industrial biotechnology education.field_of_study General Computer Science Computer science Population Initialization Feature selection 02 engineering and technology 020901 industrial engineering & automation Discriminative model Feature (computer vision) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Standard algorithms education Class variable Algorithm Parametric statistics
Zdroj:	Journal of Ambient Intelligence and Humanized Computing. 13:3025-3035
ISSN:	1868-5145 1868-5137
DOI:	10.1007/s12652-021-03136-6
Popis:	Selecting highly discriminative features from a whole feature set has become an important research area. Not only can this improve the performance of classification, but it can also decrease the cost of system diagnoses when a large number of noisy, redundant features are excluded. Binary nature-inspired algorithms have been used as a feature selection procedure. Each of these algorithms requires an initial population to be set, and the appropriateness of the initialization plays a key role in the final result. At the stage of population initialization, the positions are initialized randomly by uniform distribution which leads to a high variability of the classification results. To avoid the randomness of the population generated and to take into account the relation between each feature and the class variable, parametric and non-parametric methods, such as the t-test and Wilcoxon rank sum test are proposed as an initial population in the binary nature-inspired algorithms. This modification can help these binary algorithms to enhance global exploration and local exploitation or exhibit a slow convergence speed compared with the standard procedure. The binary bat, gray wolf, and whale algorithms are considered. The performance of our proposed methods is evaluated on ten publicly available datasets with high-dimensional and low-dimensional data. The experimental results and statistical analysis confirm that the performance of our proposed methods compared with the standard algorithms is better in terms of classification accuracy, the number of selected features, running time, and feature selection stability.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::380f8086a948a661c3bade6de7eda15b https://doi.org/10.1007/s12652-021-03136-6 Zobrazit plný text záznamu Full text from SpringerLink