Extraction and classification of structured data from unstructured hepatobiliary pathology reports using large language models: a feasibility study compared with rules-based natural language processing.

Autor: Geevarghese R; Division of Interventional Radiology, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York, USA., Sigel C; Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA., Cadley J; Artificial Intelligence & Machine Learning, Digital, Informatics and Technology Solutions (DigITs), Memorial Sloan Kettering Cancer Center, New York, New York, USA, New York, New York, USA., Chatterjee S; Artificial Intelligence & Machine Learning, Digital, Informatics and Technology Solutions (DigITs), Memorial Sloan Kettering Cancer Center, New York, New York, USA, New York, New York, USA., Jain P; Artificial Intelligence & Machine Learning, Digital, Informatics and Technology Solutions (DigITs), Memorial Sloan Kettering Cancer Center, New York, New York, USA, New York, New York, USA., Hollingsworth A; Artificial Intelligence & Machine Learning, Digital, Informatics and Technology Solutions (DigITs), Memorial Sloan Kettering Cancer Center, New York, New York, USA, New York, New York, USA., Chatterjee A; Artificial Intelligence & Machine Learning, Digital, Informatics and Technology Solutions (DigITs), Memorial Sloan Kettering Cancer Center, New York, New York, USA, New York, New York, USA., Swinburne N; Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York, USA., Bilal KH; Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York, USA., Marinelli B; Division of Interventional Radiology, Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, New York, USA marinelb@mskcc.org.
Jazyk: angličtina
Zdroj: Journal of clinical pathology [J Clin Pathol] 2024 Sep 20. Date of Electronic Publication: 2024 Sep 20.
DOI: 10.1136/jcp-2024-209669
Abstrakt: Aims: Structured reporting in pathology is not universally adopted and extracting elements essential to research often requires expensive and time-intensive manual curation. The accuracy and feasibility of using large language models (LLMs) to extract essential pathology elements, for cancer research is examined here.
Methods: Retrospective study of patients who underwent pathology sampling for suspected hepatocellular carcinoma and underwent Ytrrium-90 embolisation. Five pathology report elements of interest were included for evaluation. LLMs (Generative Pre-trained Transformer (GPT) 3.5 turbo and GPT-4) were used to extract elements of interest. For comparison, a rules-based, regular expressions (REGEX) approach was devised for extraction. Accuracy for each approach was calculated.
Results: 88 pathology reports were identified. LLMs and REGEX were both able to extract research elements with high accuracy (average 84.1%-94.8%).
Conclusions: LLMs have significant potential to simplify the extraction of research elements from pathology reporting, and therefore, accelerate the pace of cancer research.
Competing Interests: Competing interests: None declared.
(© Author(s) (or their employer(s)) 2024. No commercial re-use. See rights and permissions. Published by BMJ.)
Databáze: MEDLINE