Rule-based natural language processing for automation of stroke data extraction: a validation study

Autor:	Dane, Gunter, Paulo, Puac-Polanco, Olivier, Miguel, Rebecca E, Thornhill, Amy Y X, Yu, Zhongyu A, Liu, Muhammad, Mamdani, Chloe, Pou-Prom, Richard I, Aviv
Rok vydání:	2022
Předmět:	Stroke Automation Humans Algorithms Natural Language Processing Ischemic Stroke
Zdroj:	NeuroradiologyReferences. 64(12)
ISSN:	1432-1920
Popis:	Data extraction from radiology free-text reports is time consuming when performed manually. Recently, more automated extraction methods using natural language processing (NLP) are proposed. A previously developed rule-based NLP algorithm showed promise in its ability to extract stroke-related data from radiology reports. We aimed to externally validate the accuracy of CHARTextract, a rule-based NLP algorithm, to extract stroke-related data from free-text radiology reports.Free-text reports of CT angiography (CTA) and perfusion (CTP) studies of consecutive patients with acute ischemic stroke admitted to a regional stroke center for endovascular thrombectomy were analyzed from January 2015 to 2021. Stroke-related variables were manually extracted as reference standard from clinical reports, including proximal and distal anterior circulation occlusion, posterior circulation occlusion, presence of ischemia or hemorrhage, Alberta stroke program early CT score (ASPECTS), and collateral status. These variables were simultaneously extracted using a rule-based NLP algorithm. The NLP algorithm's accuracy, specificity, sensitivity, positive predictive value (PPV), and negative predictive value (NPV) were assessed.The NLP algorithm's accuracy was 90% for identifying distal anterior occlusion, posterior circulation occlusion, hemorrhage, and ASPECTS. Accuracy was 85%, 74%, and 79% for proximal anterior circulation occlusion, presence of ischemia, and collateral status respectively. The algorithm confirmed the absence of variables from radiology reports with an 87-100% accuracy.Rule-based NLP has a moderate to good performance for stroke-related data extraction from free-text imaging reports. The algorithm's accuracy was affected by inconsistent report styles and lexicon among reporting radiologists.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=pmid________::8f7fac6fc58d6e33e95b437fd630f13b https://pubmed.ncbi.nlm.nih.gov/35913525 Zobrazit plný text záznamu Full text from SpringerLink