Joint learning of representations of medical concepts and words from EHR data
Autor: | Slobodan Vucetic, Tian Bai, Ashis Kumar Chanda, Brian L. Egleston |
---|---|
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
Topic model Scheme (programming language) Vocabulary Computer science business.industry media_common.quotation_subject Medical classification computer.software_genre Article Data modeling 03 medical and health sciences 030104 developmental biology 0302 clinical medicine Word2vec 030212 general & internal medicine Diagnosis code Artificial intelligence business computer Natural language processing Word (computer architecture) computer.programming_language media_common |
Zdroj: | BIBM |
DOI: | 10.1109/bibm.2017.8217752 |
Popis: | There has been an increasing interest in learning low-dimensional vector representations of medical concepts from electronic health records (EHRs). While EHRs contain structured data such as diagnostic codes and laboratory tests, they also contain unstructured clinical notes, which provide more nuanced details on a patient's health status. In this work, we propose a method that jointly learns medical concept and word representations. In particular, we focus on capturing the relationship between medical codes and words by using a novel learning scheme for word2vec model. Our method exploits relationships between different parts of EHRs in the same visit and embeds both codes and words in the same continuous vector space. In the end, we are able to derive clusters which reflect distinct disease and treatment patterns. In our experiments, we qualitatively show how our methods of grouping words for given diagnostic codes compares with a topic modeling approach. We also test how well our representations can be used to predict disease patterns of the next visit. The results show that our approach outperforms several common methods. |
Databáze: | OpenAIRE |
Externí odkaz: |