Popis: |
Motivation Extraction of biomedical knowledge from unstructured text poses a great challenge in the biomedical field. Named entity recognition (NER) promises to improve information extraction and retrieval. However, existing approaches require manual annotation of large training text corpora, which is laborious and time-consuming. To address this problem we adopted deep learning technique that repurposes the 43,900,000 Entity-free-text pairs available in metadata associated with the NCBI BioSample archive to train a scalable NER model. This NER model can assist in biospecimen metadata annotation by extracting named-entities from user-supplied free-text descriptions. Results We evaluated our model against two validation sets, namely data sets consisting of short-phrases and long sentences. We achieved an accuracy of 93.29% and 93.40% in the short-phrase validation set and long sentence validation set respectively. Availability All the analyses, pre-trained model, environments, and Jupyter notebooks pertaining to this manuscript are available on Github: https://github.com/brianyiktaktsui/DEEP_NLP . Contact hkcarter@ucsd.edu |