Prediction of SARS-CoV-2-positivity from million-scale complete blood counts using machine learning.

Autor: Zuin G; Universidade Federal de Minas Gerais, CS Dept., Belo Horizonte, Brazil.; Kunumi, Belo Horizonte, Brazil., Araujo D; Universidade Federal de Minas Gerais, CS Dept., Belo Horizonte, Brazil.; Huna, São Paulo, Brazil., Ribeiro V; Huna, São Paulo, Brazil., Seiler MG; Kunumi, Belo Horizonte, Brazil., Prieto WH; Grupo Fleury, São Paulo, Brazil., Pintão MC; Grupo Fleury, São Paulo, Brazil., Dos Santos Lazari C; Grupo Fleury, São Paulo, Brazil., Granato CFH; Grupo Fleury, São Paulo, Brazil., Veloso A; Universidade Federal de Minas Gerais, CS Dept., Belo Horizonte, Brazil.
Jazyk: angličtina
Zdroj: Communications medicine [Commun Med (Lond)] 2022 Jun 15; Vol. 2, pp. 72. Date of Electronic Publication: 2022 Jun 15 (Print Publication: 2022).
DOI: 10.1038/s43856-022-00129-0
Abstrakt: Background: The Complete Blood Count (CBC) is a commonly used low-cost test that measures white blood cells, red blood cells, and platelets in a person's blood. It is a useful tool to support medical decisions, as intrinsic variations of each analyte bring relevant insights regarding potential diseases. In this study, we aimed at developing machine learning models for COVID-19 diagnosis through CBCs, unlocking the predictive power of non-linear relationships between multiple blood analytes.
Methods: We collected 809,254 CBCs and 1,088,385 RT-PCR tests for SARS-Cov-2, of which 21% (234,466) were positive, from 900,220 unique individuals. To properly screen COVID-19, we also collected 120,807 CBCs of 16,940 individuals who tested positive for other respiratory viruses. We proposed an ensemble procedure that combines machine learning models for different respiratory infections and analyzed the results in both the first and second waves of COVID-19 cases in Brazil.
Results: We obtain a high-performance AUROC of 90 + % for validations in both scenarios. We show that models built solely of SARS-Cov-2 data are biased, performing poorly in the presence of infections due to other RNA respiratory viruses.
Conclusions: We demonstrate the potential of a novel machine learning approach for COVID-19 diagnosis based on a CBC and show that aggregating information about other respiratory diseases was essential to guarantee robustness in the results. Given its versatile nature, low cost, and speed, we believe that our tool can be particularly useful in a variety of scenarios-both during the pandemic and after.
Competing Interests: Competing interestsThe authors declare no competing interests.
(© The Author(s) 2022.)
Databáze: MEDLINE