COVID-19 Likelihood Meter: a machine learning approach to COVID-19 screening for Indonesian health workers

Autor:	Levana Sani, Nurul Luntungan, Adrianna Bella, Panji Fortuna Hadisoemarto, Dimitri Swashtika, Olivia Herlinda, Diah S. Saminarsih, Muhammad Aji Muharrom, Astrid Irwanto, Akmal Taher, Joseph L. Greenstein, Shreyash Sonthalia
Rok vydání:	2021
Předmět:	education.field_of_study Coronavirus disease 2019 (COVID-19) business.industry Computer science Population Machine learning computer.software_genre language.human_language Random forest Indonesian Brier score Test set Classifier (linguistics) language Generalizability theory Artificial intelligence business education computer
DOI:	10.1101/2021.10.15.21265021
Popis:	The COVID-19 pandemic poses a heightened risk to health workers, especially in low- and middle-income countries such as Indonesia. Due to the limitations to implementing mass RT-PCR testing for health workers, high-performing and cost-effective methodologies must be developed to help identify COVID-19 positive health workers and protect the spearhead of the battle against the pandemic. This study aimed to investigate the application of machine learning classifiers to predict the risk of COVID-19 positivity (by RT-PCR) using data obtained from a survey specific to health workers. Machine learning tools can enhance COVID-19 screening capacity in high-risk populations such as health workers in environments where cost is a barrier to accessibility of adequate testing and screening supplies. We built two sets of COVID-19 Likelihood Meter (CLM) models: one trained on data from a broad population of health workers in Jakarta and Semarang (full model) and tested on the same, and one trained on health workers from Jakarta only (Jakarta model) and tested on an independent population of Semarang health workers. The area under the receiver-operating-characteristic curve (AUC), average precision (AP), and the Brier score (BS) were used to assess model performance. Shapley additive explanations (SHAP) were used to analyze feature importance. The final dataset for the study included 3979 health workers. For the full model, the random forest was selected as the algorithm of choice. It achieved cross-validation mean AUC of 0.818 ± 0.022 and AP of 0.449 ± 0.028 and was high performing during testing with AUC and AP of 0.831 and 0.428 respectively. The random forest model was well-calibrated with a low mean brier score of 0.122 ± 0.004. A random forest classifier was the best performing model during cross-validation for the Jakarta dataset, with AUC of 0.824 ± 0.008, AP of 0.397 ± 0.019, and BS of 0.102 ± 0.007, but the extra trees classifier was selected as the model of choice due to better generalizability to the test set. The performance of the extra trees model, when tested on the independent set of Semarang health workers, was AUC of 0.672 and AP of 0.508. Our models yielded high predictive performance and may have the potential to be utilized as both a COVID-19 screening tool and a method to identify health workers at greatest risk of COVID-19 positivity, and therefore most in need of testing.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::355670a71bd6659400b93004f0e0a43c https://doi.org/10.1101/2021.10.15.21265021 Zobrazit plný text záznamu