Popis: |
目的 利用可解释的机器学习模型,探讨轻型缺血性卒中(minor ischemic stroke,MIS)2年内复发相关危险因素。 方法 回顾性收集2020年7—12月山西省心血管病医院神经内科MIS患者一般资料、实验室结果、影像学等资料,单因素分析进行复发危险因素变量筛选,合成少数过采样技术-标称连续处理数据不平衡,数据集按8∶2的比例分为训练集与测试集,网格搜索10折交叉验证构建轻量梯度提升机(light gradient boosting machine,LightGBM)、支持向量机(support vector machine,SVM)模型,并与逻辑回归(logistic regression,LR)模型进行比较,基于ROC的AUC、校准曲线分别评价模型的区分度与校准度,性能最好的模型通过Shapley加性解释(Shapley additive explanation,SHAP)模型对预测结果进行解读。 结果 本研究共纳入520例MIS患者,2年内复发93例(17.9%),测试集中LightGBM、SVM、LR预测患者2年内复发的AUC分别为0.935(95%CI 0.896~0.973)、0.833(95%CI 0.770~0.896)、0.764(95%CI 0.691~0.835),准确度分别为0.890、0.773、0.693,布里尔分数分别为0.105、0.167、0.200。结果显示LightGBM模型性能最优,基于SHAP的LightGBM可解释模型重要性前5的是舒张压、年龄、糖尿病、LDL-C、吸烟。 结论 本研究建立的LightGBM模型预测效果良好,可为MIS患者2年内复发的预测提供借鉴。通过SHAP可解释性帮助临床医师更好地理解预测模型结果背后的原因,对MIS患者做出更个性化与合理化的临床决策。 Abstract: Objective To explore the risk factors related to the recurrence of minor ischemic stroke (MIS) within two years by using an interpretable machine learning model. Methods General data, laboratory results, imaging, and other data of patients with MIS in the Department of Neurology, Shanxi Cardiovascular Hospital from July to December 2020 were retrospectively collected. The risk factors for recurrence were screened by univariate analysis. Synthetic minority oversampling technique-nominal continuous treated the imbalance in the data. The data set was divided into a training set and a test set in a ratio of 8∶2. Grid search 10-fold cross-validation to build light gradient boosting machine (LightGBM) and support vector machine (SVM) models. Compared with the logistic regression (LR) model, the discrimination and calibration degree of the model were evaluated based on the AUC and calibration curve, respectively. The model with the best performance was interpreted by the Shapley additive explanation (SHAP) model. Results A total of 520 patients with MIS were included in this study, and 93 (17.9%) relapsed within two years. The AUC of LightGBM, SVM, and LR predicted recurrence within 2 years in the test set were 0.935 (95%CI 0.896-0.973), 0.833 (95%CI 0.770-0.896), and 0.764 (95%CI 0.691-0.835), respectively. The accuracy was 0.890, 0.773, 0.693, and the Brier score was 0.105, 0.167, and 0.200, respectively. The results showed that the LightGBM model had the best performance. The top 5 features of the SHAP-based LightGBM explanatory model were diastolic blood pressure, age, diabetes mellitus, LDL-C, and smoking. Conclusions The prediction effect of the LightGBM model established in this study is good, and it can provide a reference for predicting recurrence in patients with MIS within two years. SHAP interpretability helps clinicians better understand the reasons behind prediction model results and make more personalized and rational clinical decisions for patients with MIS. |