EMLI-ICC: an ensemble machine learning-based integration algorithm for metastasis prediction and risk stratification in intrahepatic cholangiocarcinoma

Autor: Jian Ruan, Shuaishuai Xu, Ruyin Chen, Wenxin Qu, Qiong Li, Chanqi Ye, Wei Wu, Qi Jiang, Feifei Yan, Enhui Shen, Qinjie Chu, Yunlu Jia, Xiaochen Zhang, Wenguang Fu, Jinzhang Chen, Michael P Timko, Peng Zhao, Longjiang Fan, Yifei Shen
Rok vydání: 2022
Předmět:
Zdroj: Briefings in bioinformatics. 23(6)
ISSN: 1477-4054
Popis: Robust strategies to identify patients at high risk for tumor metastasis, such as those frequently observed in intrahepatic cholangiocarcinoma (ICC), remain limited. While gene/protein expression profiling holds great potential as an approach to cancer diagnosis and prognosis, previously developed protocols using multiple diagnostic signatures for expression-based metastasis prediction have not been widely applied successfully because batch effects and different data types greatly decreased the predictive performance of gene/protein expression profile-based signatures in interlaboratory and data type dependent validation. To address this problem and assist in more precise diagnosis, we performed a genome-wide integrative proteome and transcriptome analysis and developed an ensemble machine learning-based integration algorithm for metastasis prediction (EMLI-Metastasis) and risk stratification (EMLI-Prognosis) in ICC. Based on massive proteome (216) and transcriptome (244) data sets, 132 feature (biomarker) genes were selected and used to train the EMLI-Metastasis algorithm. To accurately detect the metastasis of ICC patients, we developed a weighted ensemble machine learning method based on k-Top Scoring Pairs (k-TSP) method. This approach generates a metastasis classifier for each bootstrap aggregating training data set. Ten binary expression rank-based classifiers were generated for detection of metastasis separately. To further improve the accuracy of the method, the 10 binary metastasis classifiers were combined by weighted voting based on the score from the prediction results of each classifier. The prediction accuracy of the EMLI-Metastasis algorithm achieved 97.1% and 85.0% in proteome and transcriptome datasets, respectively. Among the 132 feature genes, 21 gene-pair signatures were developed to establish a metastasis-related prognosis risk-stratification model in ICC (EMLI-Prognosis). Based on EMLI-Prognosis algorithm, patients in the high-risk group had significantly dismal overall survival relative to the low-risk group in the clinical cohort (P-value
Databáze: OpenAIRE