Characterizing Sub-Cohorts via Data Normalization and Representation Learning

Autor:	Ozgur Ozmen, Merry Ward, Byung H. Park, Makoto Jones, Everett Rush, Kathryn Knight, Jonathan R. Nebeker, Clifton R. Baker
Rok vydání:	2020
Předmět:	020205 medical informatics Computer science business.industry 02 engineering and technology Medical classification computer.software_genre Missing data Autoencoder Pipeline (software) Database normalization Set (abstract data type) 03 medical and health sciences 0302 clinical medicine Cohort ComputingMilieux_COMPUTERSANDEDUCATION 0202 electrical engineering electronic engineering information engineering ComputingMilieux_COMPUTERSANDSOCIETY 030212 general & internal medicine Artificial intelligence business computer Feature learning Natural language processing
Zdroj:	CBMS DOE / OSTI
Popis:	The process of identifying a cohort of interest is a very challenging task. It requires manually inspecting many patient records of complex structure that might include medical coding errors and missing data. This paper presents a computational pipeline for refining the process of cohort selection based on medical concepts recorded in the electronic health records (EHRs). The pipeline extracts EHR data for a given cohort and normalizes this data using standard vocabularies. Then a stacked denoising autoencoder is used to embed the normalized patient vectors in a low dimensional space, where the patients are subsequently clustered into sub-cohorts. The goal is to represent the cohort in a standard format and abstract variants of sub-populations. As a use-case, we applied the pipeline to 1.8 million Veterans diagnosed with major depressive disorder (MDD), and identified four meaningful sub-cohorts using the features learned by the autoencoder. Then, each sub-cohort was explored using a set of keywords for interpretation.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::91dbce806f481a9361f2b751559032db https://doi.org/10.1109/cbms49503.2020.00040 Zobrazit plný text záznamu