Toward an Accurate Liver Disease Prediction Based on Two-Level Ensemble Stacking Model

Autor:	Marghany Hassan Mohamed, Botheina Hussein Ali, Ahmed Ibrahim Taloba, Ahmad O. Aseeri, Mohamed Abd Elaziz, Shaker El-Sappagah, Nora Mahmoud El-Rashidy
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Ensemble stacking feature selection ILPD dataset liver disease prediction machine learning Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 180210-180237 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3459429
Popis:	The difficulty of detecting liver disease at an early stage goes back to its limited number of symptoms. In this study, single and ensemble machine learning (ML) algorithms are applied to the Indian Liver Patient Dataset (ILPD) dataset, and their results, without and with feature selection techniques, are compared between each other and to the existing studies. Also, a two-level ensemble stacking model is applied based on several meta-ensemble classifiers and the feature selection technique to optimize the accuracy of the ensemble classifiers. Several data preprocessing techniques are employed to optimize the accuracy of the proposed work, including data encoding, data cleaning, data scaling, data skewing transformation, data balancing, and feature selection. The choices of single model ML are logistic regression (LR), K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), and multilayer perceptron (MLP). In contrast, the choices of ensemble ML models are extra tree classifier, random forest (RF), gradient boosting, AdaBoost, extreme gradient boosting (XGBoost), and ensemble stacking classifier. Among the ensemble models, the ensemble stacking model achieved the highest accuracies (93.88% and 94.12%) when trained without and with the feature selection technique using the 10-fold cross-validation. The two-level ensemble stacking model achieved the highest performance with the metrics values: accuracy (94.01%), Precision (94.44%), Recall (94.25%), F1-score (94.01%), and area under the ROC curve (94.25%) when trained with feature selection technique. These results indicate that our proposed technique achieved a high prediction model for liver disease.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/3f73bca9f0d94533ad3e35a6fad6923d Zobrazit plný text záznamu View record in DOAJ