Comparative Analysis of Machine Learning Algorithms for CKD Risk Prediction

Autor:	Weilin Yang, Nasim Ahmed, Andre L. C. Barczak
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Chronic kidney disease machine learning deep learning risk prediction healthcare analytics Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 171205-171220 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3499355
Popis:	Chronic Kidney Disease (CKD) remains a significant global health challenge, with increasing prevalence and a substantial impact on patient quality of life. Early and accurate prediction of CKD risk is crucial for timely intervention and management. This study presents a comprehensive comparative analysis of both machine learning and deep learning algorithms applied to predict CKD risk. The research involved the application of eight traditional machine learning algorithms: Naive Bayes, K-nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, Logistic Regression, AdaBoost, and XGBoost, each implemented on a CKD dataset retrieved from the UCI data repository. Furthermore, three neural network-based algorithms, Artificial Neural Network (ANN), Simple Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM) were used to compare to the traditional algorithms. This comparative study not only assessed each algorithm’s performance in terms of accuracy, precision, recall, and F1 score but also examined their computational efficiency and applicability in real-world clinical settings. All eleven algorithms were trained with three versions of the dataset. The first version kept the original unbalance between classes and used KNN imputation to fill up missing values (unbalanced). The second dataset used SMOTENC to create new samples to balance the dataset (balanced). The third dataset used feature selection to choose 14 features from the original 24. The results showed that there is almost no performance difference among the classifiers produced with the balanced, unbalanced and feature selection datasets. This means that the best algorithms for this task are the ones with short training and testing runtime, namely RF, SVM, AdaBoost and XGBoost. The experiments also showed that the neural network-based algorithms had no performance advantage and were slower to train due to the small size of samples available in the original dataset.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/1add961bcd754646b1188611b7bcd959 Zobrazit plný text záznamu View record in DOAJ