The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification

Autor: Mahesh T R, Vinoth Kumar V, Dhilip Kumar V, Oana Geman, Martin Margala, Manisha Guduri
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Healthcare Analytics, Vol 4, Iss , Pp 100247- (2023)
Druh dokumentu: article
ISSN: 2772-4425
DOI: 10.1016/j.health.2023.100247
Popis: Breast cancer is one of the most common causes of death among women, and early diagnosis is vital for reducing the fatality rate. This study evaluates the most widely used machine-learning breast cancer prediction and diagnosis methods. We use synthetic minority over-sampling to handle imbalanced data in the breast cancer diagnosis dataset obtained from the Wisconsin Machine Learning Repository. We use a variety of machine learning algorithms, including Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest Neighbours (KNN), Classification and Regression Tree (CART), Naive Bayes (NB), and well-known ensembles methods like Majority-Voting, eXtreme Gradient Boosting algorithm (XGBoost), and Random Forest (RF) for the breast cancer classification. The findings show that the Majority-Voting ensemble method, built on the top three classifiers (LR, SVM, and CART), outperforms all other individual classifiers and offers the highest accuracy of 99.3%.
Databáze: Directory of Open Access Journals