Interpretable Phenotyping for Electronic Health Records
Autor: | Christine Allen, Ankur Teredesai, Vikas Kumar, Juhua Hu, Muhammad Aurangzeb Ahmad |
---|---|
Rok vydání: | 2021 |
Předmět: |
business.industry
Dimensionality reduction media_common.quotation_subject Machine learning computer.software_genre Empirical research Health care Feature (machine learning) Unsupervised learning Observational study Quality (business) Artificial intelligence business computer Interpretability media_common |
Zdroj: | ICHI |
DOI: | 10.1109/ichi52183.2021.00034 |
Popis: | Datasets from Electronic Health Records (EHRs) are increasingly large and complex, creating challenges in their use for predictive modeling. The two major challenges are large-scale and high-dimensionality. One of the common way to address the large-scale challenge is through use of data phenotypes: clinically relevant characteristic groupings that can be expressed as logical queries (e.g., “senior patients with diabetes”). With the increasing use of machine learning across the continuum of care, phenotypes play an important role in modeling for population management, clinical trials, observational and interventional research, and quality measures. Yet, phenotype interpretation can often be difficult and require post-hoc clarifications from experienced clinicians. For example, detailed analysis may be needed to find that all patients in a a phenotype are diabetic seniors with complications from previous surgery. Moreover, the high-dimensionality problem is often addressed either separately or simultaneously with phenotyping by dimension reduction methods that may further hinder interpretability. In this paper, we introduce the notion of interpretable data phenotypes generated by an unsupervised learning technique. Methods are designed to disambiguate relative feature memberships, thus facilitating general clinical validation, and alleviating the problem of high-dimensionality. The empirical study applies the proposed unsupervised interpretable phenotyping method to a real world healthcare dataset (MIMIC), then uses hospital length of stay as a reference prediction task. The results demonstrate that the proposed method produces phenotypes with improved interpretability and without diminishing the quality of prediction results. |
Databáze: | OpenAIRE |
Externí odkaz: |