Machine Learning Estimation of Low-Density Lipoprotein Cholesterol in Women With and Without HIV
Autor: | Sanjay Rajagopalan, Chris T. Longenecker, Tony Dong, Chang H. Kim, Mariam N. Rana, Sadeer G. Al-Kindi |
---|---|
Rok vydání: | 2022 |
Předmět: |
Adult
Coefficient of determination Mean squared error HIV Infections Machine learning computer.software_genre Machine Learning Linear regression Humans Pharmacology (medical) Triglycerides Mathematics Estimation Artificial neural network business.industry Cholesterol HDL Cholesterol LDL Middle Aged Random forest Support vector machine Infectious Diseases Female Gradient boosting Artificial intelligence business computer |
Zdroj: | JAIDS Journal of Acquired Immune Deficiency Syndromes. 89:318-323 |
ISSN: | 1525-4135 |
DOI: | 10.1097/qai.0000000000002869 |
Popis: | Low-density lipoprotein cholesterol (LDL-C) is typically estimated from total cholesterol, high-density lipoprotein cholesterol, and triglycerides. The Friedewald, Martin-Hopkins, and National Institutes of Health equations are widely used but may estimate LDL-C inaccurately in certain patient populations, such as those with HIV. We sought to investigate the utility of machine learning for LDL-C estimation in a large cohort of women with and without HIV.We identified 7397 direct LDL-C measurements (5219 from HIV-infected individuals, 2127 from uninfected controls, and 51 from seroconvertors) from 2414 participants (age 39.4 ± 9.3 years) in the Women's Interagency HIV Study and estimated LDL-C using the Friedewald, Martin-Hopkins, and National Institutes of Health equations. We also optimized 5 machine learning methods [linear regression, random forest, gradient boosting, support vector machine (SVM), and neural network] using 80% of the data (training set). We compared the performance of each method using root mean square error, mean absolute error, and coefficient of determination (R2) in the holdout (20%) set.SVM outperformed all 3 existing equations and other machine learning methods, achieving the lowest root mean square error and mean absolute error, and the highest R2 (11.79 and 7.98 mg/dL, 0.87, respectively, compared with those obtained using the Friedewald equation: 12.45 and 9.14 mg/dL, 0.87). SVM performance remained superior in subgroups with and without HIV, with nonfasting measurements, in LDL70 mg/dL and triglycerides400 mg/dL.In this proof-of-concept study, SVM is a robust method that predicts directly measured LDL-C more accurately than clinically used methods in women with and without HIV. Further studies should explore the utility in broader populations. |
Databáze: | OpenAIRE |
Externí odkaz: |