Identification of muscle-invasion status in bladder cancer patients using natural language processing and machine learning
Autor: | Ruixin Yang, Di Zhu, Lauren Howard, Amanda M. De Hoedt, Zachary William Abraham Klaassen, Stephen J. Freedland, Stephen B. Williams |
---|---|
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | Journal of Clinical Oncology. 40:447-447 |
ISSN: | 1527-7755 0732-183X |
DOI: | 10.1200/jco.2022.40.6_suppl.447 |
Popis: | 447 Background: Mortality from bladder cancer (BC) increases exponentially once it invades the muscle. At the population level, accurate delineation of these patients is challenging. Methods: To develop and validate a natural language processing (NLP) model for automatically identifying muscle-invasive BC (MIBC) patients, aiding in population-based BC research. All patients with a CPT code for transurethral resection of bladder tumor (TURBT) (N = 76,060) were selected from the Department of Veterans Affairs (VA) Corporate Data Warehouse database. A sample of 600 patients (with 2,337 full-text notes) who had TURBT and confirmed pathology results were selected for NLP model development (500 patients) and validation (100 patients). Muscle-invasion (yes/no), unknown, or no cancer, were confirmed by detailed chart review of pathology notes. The NLP performance was assessed by calculating the sensitivity, positive predictive value (PPV), and overall accuracy at the individual note and patient levels. Results: In the validation cohort, the NLP model had overall accuracy of 88% and 92% at the note and patient levels. Specifically, PPV and specificity for predicting muscle-invasion on note level were 83% and 70%, respectively. The model classified non-muscle invasive BC (NMIBC) with 98% sensitivity at both the note and patient levels. Although the sensitivity for MIBC was 70% for note-level determination, the sensitivity was 86% when evaluated at the patient level. When applying the model to 71,200 patients VA-wide, the model classified 13,642 (19%) as having MIBC and 47,595 (66%) as NMIBC. The NLP model was able to identify invasion status for 96% TURBT patients at the population level. Inherent limitations include relatively small training set given the size of the VA population. Conclusions: This NLP model for identifying muscle-invasion at the population level had high accuracy. The NLP model may be a practical and accurate tool for efficiently identifying BC invasion status and may potentially aid in population-based BC research in the VA. |
Databáze: | OpenAIRE |
Externí odkaz: |