Learning relevance models for patient cohort retrieval.
Autor: | Goodwin TR; Department of Computer Science, Human Language Technology Research Institute, University of Texas at Dallas, Richardson, Texas, USA., Harabagiu SM; Department of Computer Science, Human Language Technology Research Institute, University of Texas at Dallas, Richardson, Texas, USA. |
---|---|
Jazyk: | angličtina |
Zdroj: | JAMIA open [JAMIA Open] 2018 Oct; Vol. 1 (2), pp. 265-275. Date of Electronic Publication: 2018 Sep 28. |
DOI: | 10.1093/jamiaopen/ooy010 |
Abstrakt: | Objective: We explored how judgements provided by physicians can be used to learn relevance models that enhance the quality of patient cohorts retrieved from Electronic Health Records (EHRs) collections. Methods: A very large number of features were extracted from patient cohort descriptions as well as EHR collections. The features were used to investigate retrieving (1) neurology-specific patient cohorts from the de-identified Temple University Hospital electroencephalography (EEG) Corpus as well as (2) the more general cohorts evaluated in the TREC Medical Records Track (TRECMed) from the de-identified hospital records provided by the University of Pittsburgh Medical Center. The features informed a learning relevance model (LRM) that took advantage of relevance judgements provided by physicians. The LRM implements a pairwise learning-to-rank framework, which enables our learning patient cohort retrieval (L-PCR) system to learn from physicians' feedback. Results and Discussion: We evaluated the L-PCR system against state-of-the-art traditional patient cohort retrieval systems, and observed a 27% improvement when operating on EEGs and a 53% improvement when operating on TRECMed EHRs, showing the promise of the L-PCR system. We also performed extensive feature analyses to reveal the most effective strategies for representing cohort descriptions as queries, encoding EHRs, and measuring cohort relevance. Conclusion: The L-PCR system has significant promise for reliably retrieving patient cohorts from EHRs in multiple settings when trained with relevance judgments. When provided with additional cohort descriptions, the L-PCR system will continue to learn, thus offering a potential solution to the performance barriers of current cohort retrieval systems. |
Databáze: | MEDLINE |
Externí odkaz: |