Deep Learning to Discover Cancer Glycome Genes Signifying the Origins of Cancer

Autor: Raihanul Bari Tanvir, Abdullah Al Mamun, Charles J. Dimitroff, Masrur Sobhan, Ananda Mohan Mondal
Rok vydání: 2020
Předmět:
Zdroj: BIBM
DOI: 10.1109/bibm49941.2020.9313450
Popis: Background: Aberrant protein glycosylation is a common feature of cancer and contributes to malignant behavior. However, how and to what extent the cellular glycome is involved in cancer development and progression is still undefined. The primary objective of this study is to conduct insilico identification of glycome genes that could reveal a signature of cancer using expression profiles of cancer genomes. There exists a list of $\sim 500$ glycome genes in several molecular categories. This study is based on the hypothesis that if the glycosylation is a common feature of cancer, there exists a shortlist of cancer glycome genes and their expression profiles should carry the signature capable of differentiating 33 different cancers available in The Cancer Genome Atlas (TCGA).Method: The distribution of cancer samples in TCGA is highly imbalanced, ranging from 36 for Cholangiocarcinoma (CHOL) to 1089 for Breast Cancer (BRCA). Supervised feature selection approaches to identify the signature genes would be biased to larger groups. We developed a computational framework using concrete autoencoder (CAE), a deep learning-based unsupervised feature selection algorithm, to find the cancer-related glycome genes. The criteria of optimal feature subset used in this study are (a) the number of features should be as few as possible, and (b) accuracy of classification using the selected features should be >90%.Results: Our experiment showed a shortlist of glycome genes (132 genes) that can differentiate 33 different cancers with an accuracy of 92%. This study reflects that the cancer glycome genes signify the origins of cancer.
Databáze: OpenAIRE