Robust Ensemble Learning to Identify Rare Disease Patients from Electronic Health Records
Autor: | Kristin Glass, Christopher Rudolf, Rich Colbaugh, Mike Tremblay |
---|---|
Rok vydání: | 2018 |
Předmět: |
business.industry
Computer science Feature selection 02 engineering and technology Gold standard (test) Health records Machine learning computer.software_genre Ensemble learning 03 medical and health sciences Rare Diseases 0302 clinical medicine 0202 electrical engineering electronic engineering information engineering Cluster Analysis Electronic Health Records Humans 020201 artificial intelligence & image processing 030212 general & internal medicine Artificial intelligence Medical diagnosis business Cluster analysis computer Algorithms Rare disease |
Zdroj: | EMBC |
DOI: | 10.1109/embc.2018.8513241 |
Popis: | There is substantial interest in developing prediction models capable of identifying rare disease patients in population-scale databases such as electronic health records (EHRs). Deriving these models is challenging for many reasons, perhaps the most important being the limited number of patients with 'gold standard' confirmed diagnoses from which to learn. This paper presents a new cascade learning methodology which induces accurate prediction models from noisy 'silver standard' labeled data-patients provisionally labeled as positive for the target disease based on unconfirmed evidence. The algorithm combines unsupervised feature selection, supervised ensemble learning, and unsupervised ensemble clustering to enable robust learning from noisy labels. The efficacy of the approach is illustrated through a case study involving the detection of Iipodystrophy patients in a country-scale database of EHRs. The case study demonstrates our algorithm outperforms state-ofthe-art prediction techniques and can discover previously undiagnosed patients in large EHR databases. |
Databáze: | OpenAIRE |
Externí odkaz: |