D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information
Autor: | Trang M. Nguyen, Hoang-Quynh Le, Thanh Hai Dang, Sinh T. Vu |
---|---|
Rok vydání: | 2018 |
Předmět: |
0301 basic medicine
Statistics and Probability Conditional random field Relation (database) Computer science Stability (learning theory) computer.software_genre Biochemistry 03 medical and health sciences 0302 clinical medicine Named-entity recognition Rule-based machine translation Humans 030212 general & internal medicine Molecular Biology business.industry Proteins Linguistics Molecular Sequence Annotation Computer Science Applications Named entity Computational Mathematics Benchmarking 030104 developmental biology Computational Theory and Mathematics Benchmark (computing) Artificial intelligence business computer Natural language processing Software |
Zdroj: | Bioinformatics (Oxford, England). 34(20) |
ISSN: | 1367-4811 |
Popis: | Motivation Recognition of biomedical named entities in the textual literature is a highly challenging research topic with great interest, playing as the prerequisite for extracting huge amount of high-valued biomedical knowledge deposited in unstructured text and transforming them into well-structured formats. Long Short-Term Memory (LSTM) networks have recently been employed in various biomedical named entity recognition (NER) models with great success. They, however, often did not take advantages of all useful linguistic information and still have many aspects to be further improved for better performance. Results We propose D3NER, a novel biomedical named entity recognition (NER) model using conditional random fields and bidirectional long short-term memory improved with fine-tuned embeddings of various linguistic information. D3NER is thoroughly compared with seven very recent state-of-the-art NER models, of which two are even joint models with named entity normalization (NEN), which was proven to bring performance improvements to NER. Experimental results on benchmark datasets, i.e. the BioCreative V Chemical Disease Relation (BC5 CDR), the NCBI Disease and the FSU-PRGE gene/protein corpus, demonstrate the out-performance and stability of D3NER over all compared models for chemical, gene/protein NER and over all models (without NEN jointed, as D3NER) for disease NER, in almost all cases. On the BC5 CDR corpus, D3NER achieves F1 of 93.14 and 84.68% for the chemical and disease NER, respectively; while on the NCBI Disease corpus, its F1 for the disease NER is 84.41%. Its F1 for the gene/protein NER on FSU-PRGE is 87.62%. Availability and implementation Data and source code are available at: https://github.com/aidantee/D3NER. Supplementary information Supplementary data are available at Bioinformatics online. |
Databáze: | OpenAIRE |
Externí odkaz: |