Creating a scalable deep learning based Named Entity Recognition Model for biomedical textual data by repurposing BioSample free-text annotations

Autor: Carter, Hannah, Mollah, Shamim, Skola, Dylan, Dow, Michelle, Hsu, Chun-Nan, Tsui, Brian
Rok vydání: 2018
Popis: Motivation Extraction of biomedical knowledge from unstructured text poses a great challenge in the biomedical field. Named entity recognition (NER) promises to improve information extraction and retrieval. However, existing approaches require manual annotation of large training text corpora, which is laborious and time-consuming. To address this problem we adopted deep learning technique that repurposes the 43,900,000 Entity-free-text pairs available in metadata associated with the NCBI BioSample archive to train a scalable NER model. This NER model can assist in biospecimen metadata annotation by extracting named-entities from user-supplied free-text descriptions. Results We evaluated our model against two validation sets, namely data sets consisting of short-phrases and long sentences. We achieved an accuracy of 93.29% and 93.40% in the short-phrase validation set and long sentence validation set respectively. Availability All the analyses, pre-trained model, environments, and Jupyter notebooks pertaining to this manuscript are available on Github: https://github.com/brianyiktaktsui/DEEP_NLP . Contact hkcarter@ucsd.edu
Databáze: OpenAIRE