BioPREP: Deep learning-based predicate classification with SemMedDB
Autor: | Gibong Hong, Yuheun Kim, Min Song, YeonJung Choi |
---|---|
Rok vydání: | 2021 |
Předmět: |
Artificial neural network
business.industry Computer science Deep learning Information Storage and Retrieval Health Informatics Verb computer.software_genre Relationship extraction Predicate (grammar) Computer Science Applications Set (abstract data type) Machine Learning Deep Learning Key (cryptography) Feature (machine learning) Artificial intelligence Neural Networks Computer business computer Natural language processing Algorithms |
Zdroj: | Journal of biomedical informatics. 122 |
ISSN: | 1532-0480 |
Popis: | When it comes to inferring relations between entities in biomedical texts, Relation Extraction (RE) has become key to biomedical information extraction. Although previous studies focused on using rule-based and machine learning-based approaches, these methods lacked efficiency in terms of the demanding amount of feature processing while resulting in relatively low accuracy. Some existing biomedical relation extraction tools are based on neural networks. Nonetheless, they rarely analyze possible causes of the difference in accuracy among predicates. Also, there have not been enough biomedical datasets that were structured for predicate classification. With these regards, we set our research goals as follows: constructing a large-scale training dataset, namely Biomedical Predicate Relation-extraction with Entity-filtering by PKDE4J (BioPREP), based on SemMedDB then using PKDE4J as an entity-filtering tool, evaluating the performances of each neural network-based algorithms on the structured dataset. We then analyzed our model’s performance in-depth by grouping predicates into semantic clusters. Based on comprehensive experimental outcomes, the experiments showed that the BioBERT-based model outperformed other models for predicate classification. The suggested model achieved an f1-score of 0.846 when BioBERT was loaded as the pre-trained model and 0.840 when SciBERT weights were loaded. Moreover, the semantic cluster analysis showed that sentences containing key phrases were classified better, such as comparison verb + ‘than’. |
Databáze: | OpenAIRE |
Externí odkaz: |