Utilizing RAG and GPT-4 for Extraction of Substance Use Information from Clinical Notes.

Autor: Shah-Mohammadi F; Department of Biomedical Informatics, School of Medicine, University of Utah, USA., Finkelstein J; Department of Biomedical Informatics, School of Medicine, University of Utah, USA.
Jazyk: angličtina
Zdroj: Studies in health technology and informatics [Stud Health Technol Inform] 2024 Nov 22; Vol. 321, pp. 94-98.
DOI: 10.3233/SHTI241070
Abstrakt: This research investigates the application of a hybrid Retrieval-Augmented Generation (RAG) and Generative Pre-trained Transformer (GPT) pipeline for extracting and categorizing substance use information from unstructured clinical notes. The aim is to enhance the accuracy and efficiency of identifying substance use mentions and determining their status in patient documentation. By integrating RAG to pre-filter and focus the input for GPT, the pipeline strategically narrows the scope of analysis to the most relevant text segments, thereby improving the precision and recall of the extraction. Utilizing the Medical Information Mart for Intensive Care III dataset, the performance of the pipeline was evaluated through manual verification, assessing various metrics including recall, precision, F1-score, and accuracy. The results demonstrated high precision rates (up to 0.99 for drug and alcohol mentions), and substantial recall (0.88 across all substances for status of the usage).
Databáze: MEDLINE