Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods.

Autor: Saharan SS; Department of Clinical Pharmacy, University of California, San Francisco, USA.; UCSF Kane Lab, San Francisco, USA.; UC Berkeley Extension, Berkeley, USA., Nagar P; Department of Statistics, University of Rajasthan, Jaipur, India., Creasy KT; Cardiovascular Research Institute, Department of Medicine, University of California, San Francisco, USA., Stock EO; Cardiovascular Research Institute, Department of Medicine, University of California, San Francisco, USA., Feng J; Cardiovascular Research Institute, Department of Medicine, University of California, San Francisco, USA., Malloy MJ; Cardiovascular Research Institute, Department of Medicine, University of California, San Francisco, USA., Kane JP; Cardiovascular Research Institute, Department of Medicine, University of California, San Francisco, USA.
Jazyk: angličtina
Zdroj: Proceedings. International Conference on Computational Science and Computational Intelligence [Proc (Int Conf Comput Sci Comput Intell)] 2023 Dec; Vol. 2023, pp. 686-694. Date of Electronic Publication: 2024 Jul 19.
DOI: 10.1109/csci62032.2023.00118
Abstrakt: Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.
Competing Interests: Competing interests The authors report no conflicts of interest.
Databáze: MEDLINE