Use of Natural Language Processing to Improve Identification of Patients With Peripheral Artery Disease
Autor: | Steven J. Lippmann, W. Schuyler Jones, E. Hope Weissler, Shelley A. Rusincovitch, Jikai Zhang, Ricardo Henao |
---|---|
Rok vydání: | 2020 |
Předmět: |
Male
medicine.medical_specialty Arterial disease Disease 030204 cardiovascular system & hematology Amputation Surgical 03 medical and health sciences Peripheral Arterial Disease 0302 clinical medicine Predictive Value of Tests Medicine Data Mining Electronic Health Records Humans Ankle Brachial Index 030212 general & internal medicine Diagnosis Computer-Assisted Intensive care medicine Aged Natural Language Processing Aged 80 and over business.industry Endovascular Procedures Reproducibility of Results Middle Aged Identification (biology) Female Cardiology and Cardiovascular Medicine business Vascular Surgical Procedures Cohort study |
Zdroj: | Circulation. Cardiovascular interventions. 13(10) |
ISSN: | 1941-7632 |
Popis: | Background: Peripheral artery disease (PAD) is underrecognized, undertreated, and understudied: each of these endeavors requires efficient and accurate identification of patients with PAD. Currently, PAD patient identification relies on diagnosis/procedure codes or lists of patients diagnosed or treated by specific providers in specific locations and ways. The goal of this research was to leverage natural language processing to more accurately identify patients with PAD in an electronic health record system compared with a structured data–based approach. Methods: The clinical notes from a cohort of 6861 patients in our health system whose PAD status had previously been adjudicated were used to train, test, and validate a natural language processing model using 10-fold cross-validation. The performance of this model was described using the area under the receiver operating characteristic and average precision curves; its performance was quantitatively compared with an administrative data–based least absolute shrinkage and selection operator (LASSO) approach using the DeLong test. Results: The median (SD) of the area under the receiver operating characteristic curve for the natural language processing model was 0.888 (0.009) versus 0.801 (0.017) for the LASSO-based approach alone (DeLong P Conclusions: Using a natural language processing approach in addition to partial cohort preprocessing with a LASSO-based model, we were able to meaningfully improve our ability to identify patients with PAD compared with an approach using structured data alone. This model has potential applications to both interventions targeted at improving patient care as well as efficient, large-scale PAD research. Graphic Abstract: A graphic abstract is available for this article. |
Databáze: | OpenAIRE |
Externí odkaz: |