An Interpretable Deep Learning Model for Predicting the Risk of Severe COVID-19 from Spike Protein Sequence

Autor: Bahrad A. Sokhansanj, Zhengqiao Zhao, Gail L. Rosen
Rok vydání: 2022
DOI: 10.21203/rs.3.rs-1234007/v1
Popis: Throughout the COVID-19 pandemic, the virus has mutated in ways that affect its ability to infect people, cause severe disease, and escape immunity. It can be costly and time-consuming to experimentally study viral mutations. Sequencing genetic code is cheaper, and millions of SARS-CoV-2 genome sequences are available. With the quickly changing dynamics of SARS-CoV-2 evolution and patient outcomes, we need fast ways to translate sequence data to biologically meaningful and clinically relevant information. Inspired by advances in natural language processing, we design a deep learning architecture that can be visualized at multiple scales to interpret trained models. We train a model to predict the risk of severe disease based on genetic changes in the SARS-CoV-2 spike protein, which plays a key role in infection and immune response. Trained solely on spike protein sequences from pre-Omicron infections (i.e., acquired before any empirical data for Omicron was available), the model predicts Omicron sequences with a reduced risk of severe disease (by 40-50%) relative to Delta. Testing on Omicron sequences collected so far, the deep learning model’s predictions agree with real world observations, suggesting that the methodology can be applied to future variants.
Databáze: OpenAIRE