A Machine Learning Model for the Prediction of COVID-19 Severity Using RNA-Seq, Clinical, and Co-Morbidity Data.

Autor: Sethi S; Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68105, USA., Shakyawar S; Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68105, USA., Reddy AS; Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA., Patel JC; Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68105, USA., Guda C; Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68105, USA.
Jazyk: angličtina
Zdroj: Diagnostics (Basel, Switzerland) [Diagnostics (Basel)] 2024 Jun 18; Vol. 14 (12). Date of Electronic Publication: 2024 Jun 18.
DOI: 10.3390/diagnostics14121284
Abstrakt: The premise for this study emanated from the need to understand SARS-CoV-2 infections at the molecular level and to develop predictive tools for managing COVID-19 severity. With the varied clinical outcomes observed among infected individuals, creating a reliable machine learning (ML) model for predicting the severity of COVID-19 became paramount. Despite the availability of large-scale genomic and clinical data, previous studies have not effectively utilized multi-modality data for disease severity prediction using data-driven approaches. Our primary goal is to predict COVID-19 severity using a machine-learning model trained on a combination of patients' gene expression, clinical features, and co-morbidity data. Employing various ML algorithms, including Logistic Regression (LR), XGBoost (XG), Naïve Bayes (NB), and Support Vector Machine (SVM), alongside feature selection methods, we sought to identify the best-performing model for disease severity prediction. The results highlighted XG as the superior classifier, with 95% accuracy and a 0.99 AUC (Area Under the Curve), for distinguishing severity groups. Additionally, the SHAP analysis revealed vital features contributing to prediction, including several genes such as COX14, LAMB2, DOLK, SDCBP2, RHBDL1, and IER3-AS1. Notably, two clinical features, the absolute neutrophil count and Viremia Categories, emerged as top contributors. Integrating multiple data modalities has significantly improved the accuracy of disease severity prediction compared to using any single modality. The identified features could serve as biomarkers for COVID-19 prognosis and patient care, allowing clinicians to optimize treatment strategies and refine clinical decision-making processes for enhanced patient outcomes.
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje