Highly Accurate Disease Diagnosis and Highly Reproducible Biomarker Identification with PathFormer

Autor: Dong, Zehao, Zhao, Qihang, Payne, Philip R. O., Province, Michael A, Cruchaga, Carlos, Zhang, Muhan, Zhao, Tianyu, Chen, Yixin, Li, Fuhai
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
DOI: 10.21203/rs.3.rs-3576068/v1
Popis: Biomarker identification is critical for precise disease diagnosis and understanding disease pathogenesis in omics data analysis, like using fold change and regression analysis. Graph neural networks (GNNs) have been the dominant deep learning model for analyzing graph-structured data. However, we found two major limitations of existing GNNs in omics data analysis, i.e., limited-prediction (diagnosis) accuracy and limited-reproducible biomarker identification capacity across multiple datasets. The root of the challenges is the unique graph structure of biological signaling pathways, which consists of a large number of targets and intensive and complex signaling interactions among these targets. To resolve these two challenges, in this study, we presented a novel GNN model architecture, named PathFormer, which systematically integrate signaling network, priori knowledge and omics data to rank biomarkers and predict disease diagnosis. In the comparison results, PathFormer outperformed existing GNN models significantly in terms of highly accurate prediction capability ( 30% accuracy improvement in disease diagnosis compared with existing GNN models) and high reproducibility of biomarker ranking across different datasets. The improvement was confirmed using two independent Alzheimer's Disease (AD) and cancer transcriptomic datasets. The PathFormer model can be directly applied to other omics data analysis studies.
Databáze: arXiv