Natural Language Processing Algorithm to Extract Multiple Myeloma Stage From Oncology Notes in the Veterans Affairs Healthcare System.

Autor: Goryachev, Sergey D., Yildirim, Cenk, DuMontier, Clark, La, Jennifer, Dharne, Mayuri, Gaziano, J. Michael, Brophy, Mary T., Munshi, Nikhil C., Driver, Jane A., Do, Nhan V., Fillmore, Nathanael R.
Předmět:
Zdroj: JCO Clinical Cancer Informatics; 7/22/2024, Vol. 8, p1-7, 7p
Abstrakt: PURPOSE: Stage in multiple myeloma (MM) is an essential measure of disease risk, but its measurement in large databases is often lacking. We aimed to develop and validate a natural language processing (NLP) algorithm to extract oncologists' documentation of stage in the national Veterans Affairs (VA) Healthcare System. METHODS: Using nationwide electronic health record (EHR) and cancer registry data from the VA Corporate Data Warehouse, we developed and validated a rule-based NLP algorithm to extract oncologist-determined MM stage. To that end, a clinician annotated MM stage within over 5,000 short snippets of clinical notes, and annotated MM stage at MM treatment initiation for 200 patients. These were allocated into snippet- and patient-level development and validation sets. We developed MM stage extraction and roll-up algorithms within the development sets. After the algorithms were finalized, we validated them using standard measures in held-out validation sets. RESULTS: We developed algorithms for three different MM staging systems that have been in widespread use (Revised International Staging System [R-ISS], International Staging System [ISS], and Durie-Salmon [DS]) and for stage reported without a clearly defined system. Precision and recall were uniformly high for MM stage at the snippet level, ranging from 0.92 to 0.99 for the different MM staging systems. Performance in identifying for MM stage at treatment initiation at the patient level was also excellent, with precision of 0.92, 0.96, 0.90, and 0.86 and recall of 0.99, 0.98, 0.94, and 0.92 for R-ISS, ISS, DS, and unclear stage, respectively. CONCLUSION: Our MM stage extraction algorithm uses rule-based NLP and data aggregation to accurately measure MM stage documented in oncology notes and pathology reports in VA's national EHR system. It may be adapted to other systems where MM stage is recorded in clinical notes. Stage is a key factor in assessment of multiple myeloma (MM) risk, but large databases lack the necessary laboratory and pathology data for staging. We developed a rules-based NLP algorithm to extract oncologist-determined MM stage from VA EHR data. The algorithm demonstrated accurate assignment of MM stage when validated against clinician annotation. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index