Development and Evaluation of a Similarity Measure for Medical Event Sequences
Autor: | Joel Fredrickson, Michael V. Mannino, Farnoush Banaei-Kashani, Iris Corrêa das Chagas Linck, Raghda Alqurashi Raghda |
---|---|
Rok vydání: | 2017 |
Předmět: |
Normalization (statistics)
General Computer Science Hierarchical coding Computer science Nearest neighbor search Skew 020207 software engineering 02 engineering and technology Similarity measure computer.software_genre Management Information Systems 03 medical and health sciences 0302 clinical medicine Claims data Subsequence 0202 electrical engineering electronic engineering information engineering 030212 general & internal medicine Data mining computer Coding (social sciences) |
Zdroj: | ACM Transactions on Management Information Systems. 8:1-26 |
ISSN: | 2158-6578 2158-656X |
DOI: | 10.1145/3070684 |
Popis: | We develop a similarity measure for medical event sequences (MESs) and empirically evaluate it using U.S. Medicare claims data. Existing similarity measures do not use unique characteristics of MESs and have never been evaluated on real MESs. Our similarity measure, the Optimal Temporal Common Subsequence for Medical Event Sequences (OTCS-MES), provides a matching component that integrates event prevalence, event duplication, and hierarchical coding, important elements of MESs. The OTCS-MES also uses normalization to mitigate the impact of heavy positive skew of matching events and compact distribution of event prevalence. We empirically evaluate the OTCS-MES measure against two other measures specifically designed for MESs, the original OTCS and Artemis, a measure incorporating event alignment. Our evaluation uses two substantial data sets of Medicare claims data containing inpatient and outpatient sequences with different medical event coding. We find a small overlap in nearest neighbors among the three similarity measures, demonstrating the superior design of the OTCS-MES with its emphasis on unique aspects of MESs. The evaluation also provides evidence about the impact of component weights, neighborhood size, and sequence length. |
Databáze: | OpenAIRE |
Externí odkaz: |