Building an Ensemble Feature Selection Approach for Cancer Microarray Datasets Using Different Classifiers.

Autor: Sayed, Sabah, Nassef, Mohammad, Badr, Amr, Farag, Ibrahim
Předmět:
Zdroj: International Journal of Intelligent Engineering & Systems; 2019, Vol. 12 Issue 4, p50-61, 12p
Abstrakt: The challenge of processing the Microarray datasets with its high dimensionality opened multiple research directions. Different feature selection techniques have been employed to reduce the dimensionality of such Microarray datasets before being attempted by classification algorithms. This study presents an ensemble feature selection approach based on t-test and Genetic Algorithm with five different classification algorithms as its fitness function: Support Vector Machine, Random Forest, Nearest Centroid, K Nearest Neighbour, and Maximum Likelihood with 5-fold cross validation. The proposed approach has been applied on two different datasets for Lung cancer; Microarray Gene Expression and DNA methylation datasets aiming to find the Lung cancer biomarker genes. The experimental results showed that the three genes (DLX5, KRT5, and SELENBP1) resulted from processing both datasets have higher classification accuracy (92.31%) compared to separately processing the Gene Expression and the DNA methylation datasets with accuracies 90.38% and 86.54% respectively. Moreover, the classification accuracy achieved using the three aforementioned genes could not be achieved by other research studies unless by using more genes. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index