Performance Assessment of Ensemble-Tree Learning Models on Breast Cancer Dataset

Autor: O. I. ALABI, O. J. FATABO, S. C. OKENU, T. A. ONIMISI, G. O. OYERINDE, I. J. UDOH, A. I. OZOEZE, A. C. EGBA
Jazyk: English<br />French
Rok vydání: 2024
Předmět:
Zdroj: Journal of Information Sciences, Vol 23, Iss 1 (2024)
Druh dokumentu: article
ISSN: 1113-4844
2820-6894
DOI: 10.34874/IMIST.PRSM/jis-v23i1.41823
Popis: Advancements of feature extraction enable the collection of prognostic data values which can be used to distinguish between benign and malignant tumours. While single learning models are capable of making predictions, combining weak learners to form an ensemble can improve predictive performance. This study evaluates and compares the performance of a few selected ensemble-tree machine learning models as applied to a Wisconsin Diagnostic breast cancer (WDBC) dataset. The dataset is split, producing a 60% training and 40% test division set. Random Forest classifier, Extremely Randomized Trees classifier, Gradient Boosting machine classifier and Extreme Gradient Boosting classifier were initialized with 3 weak learners and fit to the training set, with subsequent predictions made on the test set. Evaluation metrics used include Accuracy, Area under Receiver Operating Characteristic curves (AUROC), Precision-Recall curves and F2 scores followed by a Stratified 5-fold cross-validation procedure. Taking Precision and Recall into higher consideration, Extreme Gradient Boosting classifier and Extremely Randomized Trees classifier produced better performances with an average accuracy of 0.9386 and 0.9460 respectively. Overall, the Extremely Randomized Trees classifier outperforms the rest of the models with an average F2 score of 0.4232. Keywords: Breast cancer; Classification models; Tree-based Ensemble; Supervised learning
Databáze: Directory of Open Access Journals