Learning probabilistic phenotypes from heterogeneous EHR data
Autor: | Edouard Grave, Chris H. Wiggins, Rimma Pivovarov, John Angiolillo, Noémie Elhadad, Adler J. Perotte |
---|---|
Jazyk: | angličtina |
Předmět: |
Electronic health record
Clinical phenotype modeling Inference Health Informatics Disease Phenome Machine learning computer.software_genre Data type Article Medical information systems Probabilistic modeling Electronic Health Records Humans Learning Medicine Graphical model Probability Computational model Computational disease models business.industry Probabilistic logic 3. Good health Computer Science Applications Phenotype Phenotyping Data mining Diagnosis code Artificial intelligence business computer |
Zdroj: | J Biomed Inform |
ISSN: | 1532-0464 |
DOI: | 10.1016/j.jbi.2015.10.001 |
Popis: | Display Omitted We present the UPhenome model, which derives phenotypes in an unsupervised manner.UPhenome scales easily to large sets of diseases and clinical observations.The learned phenotypes combine clinical text, ICD9 codes, lab tests, and medications.UPhenome learns phenotypes that can be applied to unseen patient records. We present the Unsupervised Phenome Model (UPhenome), a probabilistic graphical model for large-scale discovery of computational models of disease, or phenotypes. We tackle this challenge through the joint modeling of a large set of diseases and a large set of clinical observations. The observations are drawn directly from heterogeneous patient record data (notes, laboratory tests, medications, and diagnosis codes), and the diseases are modeled in an unsupervised fashion. We apply UPhenome to two qualitatively different mixtures of patients and diseases: records of extremely sick patients in the intensive care unit with constant monitoring, and records of outpatients regularly followed by care providers over multiple years. We demonstrate that the UPhenome model can learn from these different care settings, without any additional adaptation. Our experiments show that (i) the learned phenotypes combine the heterogeneous data types more coherently than baseline LDA-based phenotypes; (ii) they each represent single diseases rather than a mix of diseases more often than the baseline ones; and (iii) when applied to unseen patient records, they are correlated with the patients ground-truth disorders. Code for training, inference, and quantitative evaluation is made available to the research community. |
Databáze: | OpenAIRE |
Externí odkaz: |