Bag-of-Attributes Representation: A Vector Space Model for Electronic Health Records Analysis in OMOP

Autor: Marco Antonio Gutierrez, Oscar Cuadros Linares, Daniel Lima, Caetano Traina, Bruno S. Faiçal, Jose Maria Clementino, Agma J. M. Traina, Christian C. Bones
Rok vydání: 2020
Předmět:
Zdroj: CBMS
DOI: 10.1109/cbms49503.2020.00045
Popis: Several studies have been performed worldwide to improve health services using data generated by digital medical systems. The increasing volume of data generated by these systems is making the use of knowledge discovery and data analysis techniques essential to improve the quality of the health services, which are offered by the medical facilities. However, it is possible to observe a gap, in the literature, about generic and flexible vector space models (VSM) that are well adapted to handle electronic health records (EHR), requiring that each knowledge discovery effort develop their own VSM or other representation model. This restriction can turn a knowledge discovery task over clinical pathways nonviable for comparative evaluations among different methods. Targeting such scenario, we propose the Bag-of-Attributes Representation (BOAR). BOAR represents an EHR as an n-dimensional vector space. Since BOAR takes advantage of the OMOP (Observational Medical Outcomes Partnership) standard, BOAR is able to represent records retrieved from different data models. The experimental results show that BOAR is flexible and robust to representing EHR from several sources, and allows the execution and evaluation of several clustering algorithms.
Databáze: OpenAIRE