Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods.

Autor: Ghosh, Manosij, Adhikary, Sukdev, Ghosh, Kushal Kanti, Sardar, Aritra, Begum, Shemim, Sarkar, Ram
Předmět:
Zdroj: Medical & Biological Engineering & Computing; Aug2018, Vol. 56 Issue 8, pN.PAG-N.PAG, 1p
Abstrakt: Microarray datasets play a crucial role in cancer detection. But the high dimension of these datasets makes the classification challenging due to the presence of many irrelevant and redundant features. Hence, feature selection becomes irreplaceable in this field because of its ability to remove the unrequired features from the system. As the task of selecting the optimal number of features is an NP-hard problem, hence, some meta-heuristic search technique helps to cope up with this problem. In this paper, we propose a 2-stage model for feature selection in microarray datasets. The ranking of the genes for the different filter methods are quite diverse and effectiveness of rankings is datasets dependent. First, we develop an ensemble of filter methods by considering the union and intersection of the top-n features of ReliefF, chi-square, and symmetrical uncertainty. This ensemble allows us to combine all the information of the three rankings together in a subset. In the next stage, we use genetic algorithm (GA) on the union and intersection to get the fine-tuned results, and union performs better than the latter. Our model has been shown to be classifier independent through the use of three classifiers-multi-layer perceptron (MLP), support vector machine (SVM), and K-nearest neighbor (K-NN). We have tested our model on five cancer datasets-colon, lung, leukemia, SRBCT, and prostate. Experimental results illustrate the superiority of our model in comparison to state-of-the-art methods. Graphical abstract ᅟ. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index