Spike2CGR: an efficient method for spike sequence classification using chaos game representation.

Autor: Murad, Taslim, Ali, Sarwan, Khan, Imdadullah, Patterson, Murray
Předmět:
Zdroj: Machine Learning; Oct2023, Vol. 112 Issue 10, p3633-3658, 26p
Abstrakt: Biological sequence classification is an essential task in many fields, such as machine learning, biology, and bioinformatics. Due to the spread of the coronavirus disease, numerous sequence data are available to researchers. As this virus has affected many hosts (e.g. bats, humans, chickens, etc.) and transformed into different lineages/variants (e.g., alpha, beta, gamma, and omicron, etc.), the biological sequence data for this virus is accompanied by the respective information about the affected host and lineage type of the sequences. Moreover, it is well known in the biology domain that many mutations (that happen disproportionally) related to the coronavirus are present in the spike protein region. Therefore, working with only spike sequences is usually sufficient rather than using a full-length genome for sequence analysis. For a spike sequence, this paper intends to design an image representation so that sophisticated image classification algorithms can be applied to perform the tasks of coronavirus lineages and host classifications. We propose a method based on the idea of chaos game representation (CGR), called Spike2CGR, which converts spike sequences into graphical form (images), and those images are used as input to deep learning (DL) models. We also use some domain knowledge from the biology field to design a few modified versions of Spike2CGR that are biologically meaningful and outperform the SOTA in terms of predictive performance. We use different DL models to perform coronavirus lineage and host classifications and report predictive results employing various evaluation metrics. Using two real-world datasets, we show that Spike2CGR outperforms the SOTA method in terms of predictive performance on spike sequences for variant and host classification. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index