Robust Feature Selection by Filled Function and Fisher Score

Autor: javad hamidzadeh, Mahsa kelidari
Rok vydání: 2022
DOI: 10.21203/rs.3.rs-1102788/v1
Popis: Feature selection is essential in high-dimensional data analysis and filter algorithms, and due to their simplicity and fast speed, they have increasingly been drawing attention in recent years. Retaining all features in machine learning tasks is not only inefficient but the irrelevant and redundant features may have an adverse impact on the classification accuracy rate. Feature selection is an optimization problem which aims to transform the dataset’s high-dimensional space to a lower-dimensional space by utilizing the relevant and suited features. Feature selection is a time-consuming task, while it is very effective in saving the time devoted to the learning algorithm. In feature selection algorithms, filter algorithms are increasingly attractive due to their simplicity and fast speed. In this paper, we are going to introduce a supervised filter feature selection using filled function and fisher score (FFFS). Based on this criterion, we try to find a feature subset resulting in the least classification error rate. In order to prove the effectiveness of the proposed algorithm, Extensive experiments have been conducted on 20 high-dimensional real-world datasets. Experimental results reveal the superiority of the proposed algorithm to state-of-the-art algorithms in terms of minimum classification error rate. Results validated through statistical analysis indicated that the proposed algorithm is able to outperform the reference algorithms by minimizing the redundancy of the selected features. So, the selected feature subset can avoid serious negative impacts on the classification process in real-world datasets. In addition, this paper proves the ability of the proposed algorithm in selecting the most relevant features for classification tasks by applying different noise rates to the datasets. According to the experiments, the FFFS is less affected by noisy attributes in comparison with other algorithms. Thus, it is a reasonable solution in handling noise and avoiding serious negative impacts on the classification error rate in real-world datasets.
Databáze: OpenAIRE