Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)–based ranking for concept normalization

Autor: Kris Brown, Jiacheng Zhang, Steven Bethard, Edmon Begoli, Dongfang Xu, Manoj Gopale
Jazyk: angličtina
Rok vydání: 2020
Předmět:
unified medical language system
020205 medical informatics
AcademicSubjects/SCI01060
Computer science
Patient Discharge Summaries
Health Informatics
02 engineering and technology
computer.software_genre
Research and Applications
RxNorm
03 medical and health sciences
0202 electrical engineering
electronic engineering
information engineering

Humans
natural language processing
AcademicSubjects/MED00580
030304 developmental biology
Transformer (machine learning model)
concept normalization
0303 health sciences
Training set
Artificial neural network
business.industry
Deep learning
Unified Medical Language System
deep learning
Systematized Nomenclature of Medicine
generate-and-rank
Relationship extraction
Artificial intelligence
Neural Networks
Computer

AcademicSubjects/SCI01530
business
Encoder
computer
Natural language processing
Zdroj: Journal of the American Medical Informatics Association : JAMIA
ISSN: 1527-974X
1067-5027
Popis: Objective Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization. Materials and Methods The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer. Results Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model’s accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer. Discussion Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training. Conclusions Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network–based ranking model to accurately link phrases in text to UMLS concepts.
Databáze: OpenAIRE