Enhancing Thyroid Pathology With Artificial Intelligence: Automated Data Extraction From Electronic Health Reports Using RUBY.

Autor: Culié D; Cervico-Facial Oncology Surgical Department, University Institute of Face and Neck, Centre Antoine Lacassagne University of Côte d'Azur, Nice, France.; Department of Epidemiology, Biostatistics and Health Data, Centre Antoine Lacassagne, University of Côte d'Azur, Nice, France., Schiappa R; Department of Epidemiology, Biostatistics and Health Data, Centre Antoine Lacassagne, University of Côte d'Azur, Nice, France., Contu S; Department of Epidemiology, Biostatistics and Health Data, Centre Antoine Lacassagne, University of Côte d'Azur, Nice, France., Seutin E; Department of Epidemiology, Biostatistics and Health Data, Centre Antoine Lacassagne, University of Côte d'Azur, Nice, France., Pace-Loscos T; Department of Epidemiology, Biostatistics and Health Data, Centre Antoine Lacassagne, University of Côte d'Azur, Nice, France., Poissonnet G; Cervico-Facial Oncology Surgical Department, University Institute of Face and Neck, Centre Antoine Lacassagne University of Côte d'Azur, Nice, France., Villarme A; Cervico-Facial Oncology Surgical Department, University Institute of Face and Neck, Centre Antoine Lacassagne University of Côte d'Azur, Nice, France., Bozec A; Cervico-Facial Oncology Surgical Department, University Institute of Face and Neck, Centre Antoine Lacassagne University of Côte d'Azur, Nice, France., Chamorey E; Department of Epidemiology, Biostatistics and Health Data, Centre Antoine Lacassagne, University of Côte d'Azur, Nice, France.
Jazyk: angličtina
Zdroj: JCO clinical cancer informatics [JCO Clin Cancer Inform] 2024 Dec; Vol. 8, pp. e2300263. Date of Electronic Publication: 2024 Dec 10.
DOI: 10.1200/CCI.23.00263
Abstrakt: Purpose: Thyroid nodules are common in the general population, and assessing their malignancy risk is the initial step in care. Surgical exploration remains the sole definitive option for indeterminate nodules. Extensive database access is crucial for improving this initial assessment. Our objective was to develop an automated process using convolutional neural networks (CNNs) to extract and structure biomedical insights from electronic health reports (EHRs) in a large thyroid pathology cohort.
Materials and Methods: We randomly selected 1,500 patients with thyroid pathology from our cohort for model development and an additional 100 for testing. We then divided the cohort of 1,500 patients into training (70%) and validation (30%) sets. We used EHRs from initial surgeon visits, preanesthesia visits, ultrasound, surgery, and anatomopathology reports. We selected 42 variables of interest and had them manually annotated by a clinical expert. We developed RUBY-THYRO using six distinct CNN models from SpaCy, supplemented with keyword extraction rules and postprocessing. Evaluation against a gold standard database included calculating precision, recall, and F1 score.
Results: Performance remained consistent across the test and validation sets, with the majority of variables (30/42) achieving performance metrics exceeding 90% for all metrics in both sets. Results differed according to the variables; pathologic tumor stage score achieved 100% in precision, recall, and F1 score, versus 45%, 28%, and 32% for the number of nodules in the test set, respectively. Surgical and preanesthesia reports demonstrated particularly high performance.
Conclusion: Our study successfully implemented a CNN-based natural language processing (NLP) approach for extracting and structuring data from various EHRs in thyroid pathology. This highlights the potential of artificial intelligence-driven NLP techniques for extensive and cost-effective data extraction, paving the way for creating comprehensive, hospital-wide data warehouses.
Databáze: MEDLINE