Popis: |
We present a novel framework enhancing the prediction of whether novel lineage poses the threat of eventually dominating the viral population. The framework is based purely on genomic sequence data, without requiring prior established biological analysis. Its building blocks are sets of co-evolving sites in the alignment (motifs), identified via co-evolutionary signals. The collection of such motifs forms a relational structure over the polymorphic sites. Motifs are constructed using distances quantifying the co-evolutionary coupling of pairs and manifest as co-evolving clusters of sites. We present an approach to genomic surveillance based on this notion of relational structure. Our system will issue an alert regarding a lineage, based on its contribution to drastic changes in the relational structure. We then conduct a comprehensive retrospective analysis of the COVID-19 pandemic based on SARS-CoV-2 genomic sequence data in GISAID from October 2020 to September 2022, across 21 lineages and 27 countries with weekly resolution. We investigate the performance of this surveillance system in terms of its accuracy, timeliness and robustness. Lastly, we study how well each lineage is classified by such a system. |