Evaluation of available risk scores to predict multiple cardiovascular complications for patients with type 2 diabetes mellitus using electronic health records.

Autor: Ho JC; Department of Computer Science, Emory University, 400 Dowman Drive, Atlanta, GA 30322, United States., Staimez LR; Hubert Department of Global Health, Rollins School of Public Health, Emory University, United States., Narayan KMV; Hubert Department of Global Health, Rollins School of Public Health, Emory University, United States., Ohno-Machado L; Department of Biomedical Informatics, School of Medicine, University of California San Diego, United States., Simpson RL; Center for Data Science, Nell Hodgson Woodruff School of Nursing, Emory University, United States., Hertzberg VS; Center for Data Science, Nell Hodgson Woodruff School of Nursing, Emory University, United States.
Jazyk: angličtina
Zdroj: Computer methods and programs in biomedicine update [Comput Methods Programs Biomed Update] 2023; Vol. 3. Date of Electronic Publication: 2022 Dec 19.
DOI: 10.1016/j.cmpbup.2022.100087
Abstrakt: Aims: Various cardiovascular risk prediction models have been developed for patients with type 2 diabetes mellitus. Yet few models have been validated externally. We perform a comprehensive validation of existing risk models on a heterogeneous population of patients with type 2 diabetes using secondary analysis of electronic health record data.
Methods: Electronic health records of 47,988 patients with type 2 diabetes between 2013 and 2017 were used to validate 16 cardiovascular risk models, including 5 that had not been compared previously, to estimate the 1-year risk of various cardiovascular outcomes. Discrimination and calibration were assessed by the c-statistic and the Hosmer-Lemeshow goodness-of-fit statistic, respectively. Each model was also evaluated based on the missing measurement rate. Sub-analysis was performed to determine the impact of race on discrimination performance.
Results: There was limited discrimination (c-statistics ranged from 0.51 to 0.67) across the cardiovascular risk models. Discrimination generally improved when the model was tailored towards the individual outcome. After recalibration of the models, the Hosmer-Lemeshow statistic yielded p-values above 0.05. However, several of the models with the best discrimination relied on measurements that were often imputed (up to 39% missing).
Conclusion: No single prediction model achieved the best performance on a full range of cardiovascular endpoints. Moreover, several of the highest-scoring models relied on variables with high missingness frequencies such as HbA1c and cholesterol that necessitated data imputation and may not be as useful in practice. An open-source version of our developed Python package, cvdm, is available for comparisons using other data sources.
Databáze: MEDLINE