Abstrakt: |
Microarray data have become an integral part of the clinical and drug discovery process. Due to its voluminous and heterogeneous nature, the question arises of the interpretability and stability of the traditional gene selection method. To enhance the stability of the gene selection method, so that the results are better explicable, an ameliorated Extended ReliefF gene selection algorithm is proposed. It encodes gene affinity information using a new mathematical formula based on Bayes' theorem and Manhattan distance for calculating the nearest neighbor in a pooled sample. It works in four aspects: initializing sample gene weight, improving gene weight, maximizing sample gene weight and finally adopting mutation operation. The proposed method selects the most informative genes which are highly perceptive to the prognosis of the disease. Further, to accomplish the accuracy and stability of the algorithm, soft classification is performed on Relieved_F, STIR, VLS-RelifF, I-RelieF, conventional ReliefF and proposed extended ReliefF algorithms using three classifiers namely Support Vector Machine (SVM), Multilayer Perceptron (MLP) and Random Forest (RF) on ten microarray datasets. According to the findings, MLP training times are much longer than those of RF and SVM. From a network perspective, SVM is much faster at training, whereas MLP excels in terms of accuracy. With a rise in gene similarity among the genes selected from the multiple training sets, the approach becomes more stable. As a result, it can be seen that the recommended gene selection algorithm greatly outperforms the other feature selection methods in terms of accuracy and stability. [ABSTRACT FROM AUTHOR] |