Determining a novel feature-space for SARS-CoV-2 sequence data

Autor: Didier Barradas Bautista, Francesco Ballesio, Andrea Guarracino, Lukas Heumos, Fotis Psomopoulos, Ali Haider Bangash, Anastasios Togkousidis, Marco Pietrosanto, Justin Barton, Phillip Davis, Aneesh Panoli
Rok vydání: 2020
Předmět:
DOI: 10.37044/osf.io/xt7gw
Popis: The pandemicity & the ability of the SARS-COV-2 to reinfect a cured subject, among other damaging characteristics of it, took everybody by surprise. A global collaborative scientific effort was direly required to bring learned people from different niches of medicine & data science together. Such a platform was provided by COVID19 Virtual BioHackathon, organized from the 5th to the 11th of April, 2020, to ponder on the related pressing issues varying in their diversity from text mining to genomics. Under the "Machine learning" track, we determined optimal k-mer length for feature extraction, constructed continuous distributed representations for protein sequences to create phylogenetic trees in an alignment-free manner, and clustered predicted MHC class I and II binding affinity to aid in vaccine design. All the related work in available in a Github repository under an MIT license for future research.
Databáze: OpenAIRE