Popis: |
Prediction of cardiovascular disease (CVD) is important in clinical practice. Machine learning (ML) may offer an improved alternative to current CVD risk stratification in individual patients. We aim to identify important predictors and compare ML models with traditional models according to their prediction performance in a large long-term follow-up cohort.The Atherosclerosis Risk in Communities (ARIC) study was designed to study the progression of subclinical disease to cardiovascular events over a 25-year follow-up period. All phenotypic variables at visit 1 were obtained. All-cause death, CVD, and coronary heart disease were the outcomes for analysis. The ML framework involved variable selection using the random survival forest (RSF) method, model building, and 5-fold cross-validation. Model performance was evaluated by discrimination using the Harrell concordance index (C-index), accuracy using the Brier score (BS), and interpretability using the number of variables in the model.Of the 14,842 participants in ARIC, the average age was 54.2 years, with 45.2% male and 26.2% Black participants. Thirty-eight unique variables were selected in the RSF top 20 importance ranking of all 6 outcomes. Aging, hypertension, glucose metabolism, renal function, coagulation, adiposity, and sodium retention dominated the predictions of all outcomes. The ML models outperformed the regression models and established risk scores with a higher C-index, lower BS, and varied interpretability.The ML framework is useful for identifying important predictors of CVD and for developing models with robust performance compared with existing risk models. |