The prediction of Chronic Kidney Disease stages and mining risk factors using machine learning

Autor: Chen, Chia-Hsin, 陳家欣
Rok vydání: 2018
Druh dokumentu: 學位論文 ; thesis
Popis: 106
Chronic Kidney Disease (CKD) can be detected at an early stage to prevent at-risk patients from kidney failure. One of our main objectives is to develop prediction models for CKD using common measurement items, which could provide assistance for medical doctors during medical diagnosis. Besides, this study identifies the factors regarding living habit, personal illness and family illness history that affect the CKD risk probability. We analyze a data set including medical inspection and psychosomatic questionnaire, which includes 4910 individuals participating the project of community health check of Chang Gung Memorial Hospital, Keelung. The algorithms used in the study are stepwise logistic regression (SLR) analysis and machine learning (ML) techniques, such as XGBoost, Random Forest (RF), and Support Vector Machine (SVM) methods. We develop a two-stages model to predict CKD and its progression. The first-stage model is to predict whether patients are suffering from CKD. If the patients are diagnosed as the CKD patients, the second-stage model would be taken to diagnose their stage of CKD. In the first-stage model, the XGBoost algorithm with 7 features is the best model, which achieves a classification AUC of 0.963 and F-Score of 0.943. In the second-stage model, the XGBoost with 4 features is also the best algorithm, which reaches Macro-average F-Score at 0.951 and keep in above 0.9 F-Score in the classification of all stages. To identify the living habit and illness factors that affect CKD risk probability, we use Stepwise Logistic Regression to select the statistical significant features to build the explainable model. The model can achieve the AUC of 0.751 and F-Score of 0.798 with 37 significant features. Then we analyze how the risk factors influence the risk probability of CKD in the model. We find that smoking cigarette would increase the risk, and drinking coffee would not have impact on kidney. To lower the risk probability, the patients should exercise more than 30 minutes, and the best is 80 minutes. These results are useful for developing a personalized health management strategy and develop an intervention plan to prevent from being CKD patients or reducing the progression rate of CKD.
Databáze: Networked Digital Library of Theses & Dissertations