Single Concatenated Input is Better than Indenpendent Multiple-input for CNNs to Predict Chemical-induced Disease Relation from Literature
Autor: | Pham Thi Quynh Trang, Bui Manh Thang, Dang Thanh Hai |
---|---|
Rok vydání: | 2020 |
Předmět: | |
Zdroj: | VNU Journal of Science: Computer Science and Communication Engineering. 36 |
ISSN: | 2588-1086 2615-9260 |
Popis: | Chemical compounds (drugs) and diseases are among top searched keywords on the PubMed database of biomedical literature by biomedical researchers all over the world (according to a study in 2009). Working with PubMed is essential for researchers to get insights into drugs’ side effects (chemical-induced disease relations (CDR), which is essential for drug safety and toxicity. It is, however, a catastrophic burden for them as PubMed is a huge database of unstructured texts, growing steadily very fast (~28 millions scientific articles currently, approximately two deposited per minute). As a result, biomedical text mining has been empirically demonstrated its great implications in biomedical research communities. Biomedical text has its own distinct challenging properties, attracting much attetion from natural language processing communities. A large-scale study recently in 2018 showed that incorporating information into indenpendent multiple-input layers outperforms concatenating them into a single input layer (for biLSTM), producing better performance when compared to state-of-the-art CDR classifying models. This paper demonstrates that for a CNN it is vice-versa, in which concatenation is better for CDR classification. To this end, we develop a CNN based model with multiple input concatenated for CDR classification. Experimental results on the benchmark dataset demonstrate its outperformance over other recent state-of-the-art CDR classification models. Keywords: Chemical disease relation prediction, Convolutional neural network, Biomedical text mining References [1] Paul SM, S. Mytelka, C.T. Dunwiddie, C.C. Persinger, B.H. Munos, S.R. Lindborg, A.L. Schacht, How to improve R&D productivity: The pharmaceutical industry's grand challenge, Nat Rev Drug Discov. 9(3) (2010) 203-14. https://doi.org/10.1038/nrd3078. [2] J.A. DiMasi, New drug development in the United States from 1963 to 1999, Clinical pharmacology and therapeutics 69 (2001) 286-296. https://doi.org/10.1067/mcp.2001.115132. [3] C.P. Adams, V. Van Brantner, Estimating the cost of new drug development: Is it really $802 million? Health Affairs 25 (2006) 420-428. https://doi.org/10.1377/hlthaff.25.2.420. [4] R.I. Doğan, G.C. Murray, A. Névéol et al., "Understanding PubMed user search behavior through log analysis", Oxford Database, 2009. [5] G.K. Savova, J.J. Masanz, P.V. Ogren et al., "Mayo clinical text analysis and knowledge extraction system (cTAKES): Architecture, component evaluation and applications", Journal of the American Medical Informatics Association, 2010. [6] T.C. Wiegers, A.P. Davis, C.J. Mattingly, Collaborative biocuration-text mining development task for document prioritization for curation, Database 22 (2012) pp. bas037. [7] N. Kang, B. Singh, C. Bui et al., "Knowledge-based extraction of adverse drug events from biomedical text", BMC Bioinformatics 15, 2014. [8] A. Névéol, R.L. Doğan, Z. Lu, "Semi-automatic semantic annotation of PubMed queries: A study on quality, Efficiency, Satisfaction", Journal of Biomedical Informatics 44, 2011. [9] L. Hirschman, G.A. Burns, M. Krallinger, C. Arighi, K.B. Cohen et al., Text mining for the biocuration workflow, Database Apr 18, 2012, pp. bas020. [10] Wei et al., "Overview of the BioCreative V Chemical Disease Relation (CDR) Task", Proceedings of the Fifth BioCreative Challenge Evaluation Workshop, 2015. [11] P. Verga, E. Strubell, A. McCallum, Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction, In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1 (2018) 872-884. [12] Y. Shen, X. Huang, Attention-based convolutional neural network for semantic relation extraction, In: Proceedings of COLING 2016, the Twenty-sixth International Conference on Computational Linguistics: Technical Papers, The COLING 2016 Organizing Committee, Osaka, Japan, 2016, pp. 2526-2536. [13] Y. Peng, Z. Lu, Deep learning for extracting protein-protein interactions from biomedical literature, In: Proceedings of the BioNLP 2017 Workshop, Association for Computational Linguistics, Vancouver, Canada, 2016, pp. 29-38. [14] S. Liu, F. Shen, R. Komandur Elayavilli, Y. Wang, M. Rastegar-Mojarad, V. Chaudhary, H. Liu, Extracting chemical-protein relations using attention-based neural networks, Database, 2018. [15] H. Zhou, H. Deng, L. Chen, Y. Yang, C. Jia, D. Huang, Exploiting syntactic and semantics information for chemical-disease relation extraction, Database, 2016, pp. baw048. [16] S. Liu, B. Tang, Q. Chen et al., Drug–drug interaction extraction via convolutional neural networks, Comput, Math, Methods Med, Vol (2016) 1-8. https://doi.org/10.1155/2016/6918381. [17] L. Wang, Z. Cao, G. De Meloet al., Relation classification via multi-level attention CNNs, In: Proceedings of the Fifty-fourth Annual Meeting of the Association for Computational Linguistics 1 (2016) 1298-1307. https://doi.org/10.18653/v1/P16-1123. [18] J. Gu, F. Sun, L. Qian et al., Chemical-induced disease relation extraction via convolutional neural network, Database (2017) 1-12. https://doi.org/10.1093/database/bax024. [19] H.Q. Le, D.C. Can, S.T. Vu, T.H. Dang, M.T. Pilehvar, N. Collier, Large-scale Exploration of Neural Relation Classification Architectures, In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2266-2277. [20] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, In Proceedings of the IEEE. 86(11) (1998) 2278-2324. [21] Y. Kim, Convolutional neural networks for sentence classification, ArXiv preprint arXiv:1408.5882. [22] C. Nagesh, Panyam, Karin Verspoor, Trevor Cohn and Kotagiri Ramamohanarao, Exploiting graph kernels for high performance biomedical relation extraction, Journal of biomedical semantics 9(1) (2018) 7. [23] H. Zhou, H. Deng, L. Chen, Y. Yang, C. Jia, D. Huang, Exploiting syntactic and semantics information for chemical-disease relation extraction, Database, 2016. |
Databáze: | OpenAIRE |
Externí odkaz: |