DeepHPV: a deep learning model to predict human papillomavirus integration sites
Autor: | Mengyuan Li, Weiwen Fan, Zifeng Cui, Zhaoyue Huang, Canbiao Wu, Hongxian Xie, Xun Tian, Jingyue Wei, Fubing Yu, Zheng Hu, Jingjing Zhu, Chen Cao, Wei Xu, Gang Niu, Liang Jiuxing, Xiaofang Guo, Zeshan You, Weiling Xie, Jinfeng Tan, Weng Xuchu, Rui Tian, Zhiying Yu, Ping Zhou, Zhuang Jin |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
Virus Integration Uterine Cervical Neoplasms Computational biology Alphapapillomavirus 03 medical and health sciences Cervical carcinogenesis Viral Proteins 0302 clinical medicine Deep Learning Hpv integration Humans Human papillomavirus Molecular Biology 030304 developmental biology 0303 health sciences Pan cancer Models Genetic business.industry Deep learning Papillomavirus Infections Neoplasm Proteins DNA binding site 030220 oncology & carcinogenesis Female Artificial intelligence Precision and recall business Information Systems |
Zdroj: | Briefings in bioinformatics. 22(4) |
ISSN: | 1477-4054 |
Popis: | Human papillomavirus (HPV) integrating into human genome is the main cause of cervical carcinogenesis. HPV integration selection preference shows strong dependence on local genomic environment. Due to this theory, it is possible to predict HPV integration sites. However, a published bioinformatic tool is not available to date. Thus, we developed an attention-based deep learning model DeepHPV to predict HPV integration sites by learning environment features automatically. In total, 3608 known HPV integration sites were applied to train the model, and 584 reviewed HPV integration sites were used as the testing dataset. DeepHPV showed an area under the receiver-operating characteristic (AUROC) of 0.6336 and an area under the precision recall (AUPR) of 0.5670. Adding RepeatMasker and TCGA Pan Cancer peaks improved the model performance to 0.8464 and 0.8501 in AUROC and 0.7985 and 0.8106 in AUPR, respectively. Next, we tested these trained models on independent database VISDB and found the model adding TCGA Pan Cancer performed better (AUROC: 0.7175, AUPR: 0.6284) than the model adding RepeatMasker peaks (AUROC: 0.6102, AUPR: 0.5577). Moreover, we introduced attention mechanism in DeepHPV and enriched the transcription factor binding sites including BHLHA15, CHR, COUP-TFII, DMRTA2, E2A, HIC1, INR, NPAS, Nr5a2, RARa, SCL, Snail1, Sox10, Sox3, Sox4, Sox6, STAT6, Tbet, Tbx5, TEAD, Tgif2, ZNF189, ZNF416 near attention intensive sites. Together, DeepHPV is a robust and explainable deep learning model, providing new insights into HPV integration preference and mechanism. Availability: DeepHPV is available as an open-source software and can be downloaded from https://github.com/JiuxingLiang/DeepHPV.git, Contact: huzheng1998@163.com, liangjiuxing@m.scnu.edu.cn, lizheyzy@163.com |
Databáze: | OpenAIRE |
Externí odkaz: |