Smart Pooling: AI-powered COVID-19 testing

Autor:	Pablo Arbeláez, Andrés L. Medaglia, Maria F. Roa, Jorge Madrid-Wolff, Mauricio Velasco, Catalina Gómez, Guillaume Jeanneret, Marcela Guevara-Suarez, Olga L. Sarmiento, María Escobar, Laura Bravo-Sánchez, Silvia Restrepo, Diego Valderrama, Manu Forero-Shelton, Julián Martínez, Angela Castillo, Juan Manuel Pedraza-Leal, Martha L. Cepeda
Rok vydání:	2020
Předmět:	Actuarial science Computer science Pandemic Pooling Prevalence Context (language use) Sample (statistics) Disease Set (psychology) Test (assessment)
DOI:	10.1101/2020.07.13.20152983
Popis:	SummaryBackgroundCOVID-19 is an acute respiratory illness caused by the novel coronavirus SARS-CoV-2. The disease has rapidly spread to most countries and territories and has caused 14·2 million confirmed infections and 602,037 deaths as of July 19th 2020. Massive molecular testing for COVID-19 has been pointed as fundamental to moderate the spread of the disease. Pooling methods can enhance testing efficiency, but they are viable only at very low incidences of the disease. We propose Smart Pooling, a machine learning method that uses clinical and sociodemographic data from patients to increase the efficiency of pooled molecular testing for COVID-19 by arranging samples into all-negative pools.MethodsWe developed machine learning methods that estimate the probability that a sample will test positive for SARS-Cov-2 based on complementary information from the sample. We use these predictions to exclude samples predicted as positive from pools. We trained our machine learning methods on samples from more than 8,000 patients tested for SARS-Cov-2 from April to July in Bogotá, Colombia.FindingsOur method, Smart Pooling, shows efficiency of 306% at a disease prevalence of 5% and efficiency of 107% at disease a prevalence of up to 50%, a regime in which two-stage pooling offers marginal efficiency gains compared to individual testing (see Figure 1). Additionally, we calculate the possible efficiency gains of one- and two-dimensional two-stage pooling strategies, and present the optimal strategies for disease prevalences up to 25%. We discuss practical limitations to conduct pooling in the laboratory.InterpretationPooled testing has been a theoretically alluring option to increase the coverage of diagnostics since its proposition by Dorfmann during World War II. Although there are examples of successfully using pooled testing to reduce the cost of diagnostics, its applicability has remained limited because efficiency drops rapidly as prevalence increases. Not only does our method provide a cost-effective solution to increase the coverage of testing amid the COVID-19 pandemic, but it also demonstrates that artificial intelligence can be used complementary with well-established techniques in the medical praxis.FundingFaculty of Engineering, Universidad de los Andes, Colombia.1Research in contextEvidence before this studyThe acute respiratory illness COVID-19 is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The World Health Organization (WHO) labeled COVID-19 as a pandemic in March 2020. Reports from February 2020 indicated the possibility of asymptomatic transmission of the virus, which has called for molecular testing to identify carriers of the disease and prevent them from spreading it. The dramatic rise in the global need for molecular testing has made reagents scarce. Pooling strategies for massive diagnostics were initially proposed to diagnose syphilis during World War II, but have not yet seen widespread use mainly because their efficiency falls even at modest disease prevalence.We searched PubMed, BioRxiv, and MedRxiv for articles published in English from inception to July 15th 2020 for keywords “pooling”, “testing” AND “COVID-19”, AND “machine learning” OR “artificial intelligence”. Early studies for pooled molecular testing of SARS-CoV-2 revealed the possibility of detecting single positive samples in dilutions of samples from up to 32 individuals. The first reports of pooled testing came in March from Germany and the USA. These works suggested that it was feasible to conduct pooled testing as long as the prevalence of the disease was low. Numerous theoretical works have focused only on finding or adapting the ideal pooling strategy to the prevalence of the disease. Nonetheless, many do not consider other practical limitations of putting these strategies into practice. Reports from May 2020 indicated that it was feasible to predict an individual’s status with machine learning methods based on reported symptoms.Added value of this studyWe show how artificial intelligence methods can be used to enhance, but not replace, existing well-proven methods, such as diagnostics by qPCR. We show that in this fashion, pooled testing can yield efficiency gains even as prevalence increases. Our method does not compromise the sensitivity or specificity of the diagnostics, as these are still given by the molecular test. The artificial intelligence models are simple, and we make them free to use. Remarkably, artificial intelligence methods can continuously learn from every set of samples and thus increase their performance over time.Implications of all the available evidenceUsing artificial intelligence to enhance rather than replace molecular testing can make pooling testing feasible, even as disease incidence rises. This approach could make pooled testing an effective tool to tackle the disease’s progression, particularly in territories with limited resources.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::38739f195a148a3ee825a7b2a2e764b9 https://doi.org/10.1101/2020.07.13.20152983 Zobrazit plný text záznamu