Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders.

Autor: Rancati S; Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy., Nicora G; Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy., Prosperi M; Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA.; Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA., Bellazzi R; Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy., Salemi M; Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA.; Department of Pathology, Immunology and Laboratory Medicine, College of Medicine, University of Florida, Gainesville, FL, USA., Marini S; Department of Epidemiology, College of Public Health and Health Professions, University of Florida, Gainesville, FL, USA.; Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA.
Jazyk: angličtina
Zdroj: BioRxiv : the preprint server for biology [bioRxiv] 2024 Sep 26. Date of Electronic Publication: 2024 Sep 26.
DOI: 10.1101/2023.10.24.563721
Abstrakt: The coronavirus disease of 2019 (COVID-19) pandemic is characterized by sequential emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants, lineages, and sublineages, outcompeting previously circulating ones because of, among other factors, increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute more than 10% of all the viral sequences added to the GISAID database on a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of about 4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01% - 3%), with median lead times of 4-17 weeks, and predicts FDLs ~5 and ~25 times better than a baseline approach For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness, and may provide significant insights for the optimization of public health pre-emptive intervention strategies.
Databáze: MEDLINE