Identifying cardiovascular disease risk in the U.S. population using environmental volatile organic compounds exposure: A machine learning predictive model based on the SHAP methodology.

Autor: Fu Q; Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China., Wu Y; Department of Neurosurgery, The Second Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi 330006, China., Zhu M; Gastroenterology Department, The First People's Hospital of Xiushui County, Jiujiang, Jiangxi, China., Xia Y; Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China., Yu Q; Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China., Liu Z; Rheumatology and immunology department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China., Ma X; Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China., Yang R; Cardiovascular medicine department, The Second Affiliated Hospital of Nanchang University, Nanchang, Jiangxi 330006, China. Electronic address: Yang.RQ@ncu.edu.cn.
Jazyk: angličtina
Zdroj: Ecotoxicology and environmental safety [Ecotoxicol Environ Saf] 2024 Nov 01; Vol. 286, pp. 117210. Date of Electronic Publication: 2024 Oct 23.
DOI: 10.1016/j.ecoenv.2024.117210
Abstrakt: Background: Cardiovascular disease (CVD) remains a leading cause of mortality globally. Environmental pollutants, specifically volatile organic compounds (VOCs), have been identified as significant risk factors. This study aims to develop a machine learning (ML) model to predict CVD risk based on VOC exposure and demographic data using SHapley Additive exPlanations (SHAP) for interpretability.
Methods: We utilized data from the National Health and Nutrition Examination Survey (NHANES) from 2011 to 2018, comprising 5098 participants. VOC exposure was assessed through 15 urinary metabolite metrics. The dataset was split into a training set (70 %) and a test set (30 %). Six ML models were developed, including Random Forest (RF), Light Gradient Boosting Machine (LightGBM), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), Multi-Layer Perceptron (MLP), and Support Vector Machines (SVM). Model performance was evaluated using the Area Under the Receiver Operating Characteristic Curve (AUROC), accuracy, balanced accuracy, F1 score, J-index, kappa, Matthew's correlation coefficient (MCC), positive predictive value (PPV), negative predictive value (NPV), sensitivity (sens), specificity (spec) and SHAP was applied to interpret the best-performing model.
Results: The RF model exhibited the highest predictive performance with an ROC of 0.8143. SHAP analysis identified age and ATCA as the most significant predictors, with ATCA showing a protective effect against CVD, particularly in older adults and those with hypertension. The study found a significant interaction between ATCA levels and age, indicating that the protective effect of ATCA is more pronounced in older individuals due to increased oxidative stress and inflammatory responses associated with aging. E-values analysis suggested robustness to unmeasured confounders.
Conclusions: This study is the first to utilize VOC exposure data to construct an ML model for predicting CVD risk. The findings highlight the potential of combining environmental exposure data with demographic information to enhance CVD risk prediction, supporting the development of personalized prevention and intervention strategies.
Competing Interests: Declaration of Competing Interest The authors declare no competing interests.
(Copyright © 2024 The Authors. Published by Elsevier Inc. All rights reserved.)
Databáze: MEDLINE