An enhanced soft-computing based strategy for efficient feature selection for timely breast cancer prediction: Wisconsin Diagnostic Breast Cancer dataset case.

Autor: Singh, Law Kumar, Khanna, Munish, Singh, Rekha
Předmět:
Zdroj: Multimedia Tools & Applications; Sep2024, Vol. 83 Issue 31, p76607-76672, 66p
Abstrakt: When contemplating the improvement of overall performance in machine learning (ML) models, a critical strategy for optimizing data preparation is feature selection (FS). There has been a significant rise in the popularity of metaheuristic FS algorithms in recent times. This can be attributed to their proficiency in accurately identifying and selecting the most relevant features for ML tasks. This study presents three feature selection strategies that utilize metaheuristic algorithms. The methodologies mentioned include the Gravitational Search Optimization Algorithm (GSA), Emperor Penguin Optimization (EPO), and a hybrid approach of GSA and EPO referred to as hGSAEPO. Previous research has explored the use of baseline algorithms for feature selection in various ML tasks. However, there is a lack of investigation regarding their application specifically in breast cancer(BC) classification. A combination of these two has been utilized for the first occasion. The purpose of selecting BC as the study of investigation is due to the reason that this illness is recognized as the second most prevalent cause of mortality in the female population. If the condition is detected in its initial phases, it can be remedied and can assist individuals in evading superfluous medical processes. The procedure of selecting relevant features holds significant importance in the purpose of predicting ailments like BC. The current research presents an innovative methodology that employs three soft-computing algorithms, EPO, GSA, and their proposed hybrid hGSAEPO to efficiently identify significant features while concurrently decreasing the occurrence of irrelevant ones, simplifying overall complexity and enhancing the accuracy. The utilization of these soft computing methodologies and six ML classifiers presents a viable framework for prognostic research through the classification of data instances on Wisconsin Diagnostic Breast Cancer (WDBC). The experimental findings of eight experiments conducted suggest that the suggested approach exhibits exceptional performance in the context of binary classification for BC by computing astounding results like precision of 0.9800, sensitivity of 0.9700, specificity of 0.9887, F1-score of 0.9539, area under the curve(AUC) surpassing 0.998, with an accuracy of 98.31%. We achieved our objectives by presenting a dependable clinical prediction system for healthcare professionals for efficient diagnosis. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index