Evaluation of GPT Large Language Model Performance on RSNA 2023 Case of the Day Questions.

Autor: Mukherjee P; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.)., Hou B; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.)., Suri A; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.)., Zhuang Y; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.)., Parnell C; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.)., Lee N; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.)., Stroie O; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.)., Jain R; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.)., Wang KC; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.)., Sharma K; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.)., Summers RM; From the Department of Radiology and Imaging Sciences, Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, NIH Clinical Center, 10 Center Dr, Bldg 10, Rm 1C224D, Bethesda, MD 20892-1182 (P.M., B.H., A.S., Y.Z., R.M.S.); Walter Reed National Military Medical Center, Bethesda, Md (C.P., N.L., O.S.); Radiologic Associates of Middletown, Middletown, Conn (R.J., K.S.); and Baltimore VA Medical Center, Baltimore, Md (K.C.W.).
Jazyk: angličtina
Zdroj: Radiology [Radiology] 2024 Oct; Vol. 313 (1), pp. e240609.
DOI: 10.1148/radiol.240609
Abstrakt: Background GPT-4V (GPT-4 with vision, ChatGPT; OpenAI) has shown impressive performance in several medical assessments. However, few studies have assessed its performance in interpreting radiologic images. Purpose To assess and compare the accuracy of GPT-4V in assessing radiologic cases with both images and textual context to that of radiologists and residents, to assess if GPT-4V assistance improves human accuracy, and to assess and compare the accuracy of GPT-4V with that of image-only or text-only inputs. Materials and Methods Seventy-two Case of the Day questions at the RSNA 2023 Annual Meeting were curated in this observer study. Answers from GPT-4V were obtained between November 26 and December 10, 2023, with the following inputs for each question: image only, text only, and both text and images. Five radiologists and three residents also answered the questions in an "open book" setting. For the artificial intelligence (AI)-assisted portion, the radiologists and residents were provided with the outputs of GPT-4V. The accuracy of radiologists and residents, both with and without AI assistance, was analyzed using a mixed-effects linear model. The accuracies of GPT-4V with different input combinations were compared by using the McNemar test. P < .05 was considered to indicate a significant difference. Results The accuracy of GPT-4V was 43% (31 of 72; 95% CI: 32, 55). Radiologists and residents did not significantly outperform GPT-4V in either imaging-dependent (59% and 56% vs 39%; P = .31 and .52, respectively) or imaging-independent (76% and 63% vs 70%; both P = .99) cases. With access to GPT-4V responses, there was no evidence of improvement in the average accuracy of the readers. The accuracy obtained by GPT-4V with text-only and image-only inputs was 50% (35 of 70; 95% CI: 39, 61) and 38% (26 of 69; 95% CI: 27, 49), respectively. Conclusion The radiologists and residents did not significantly outperform GPT-4V. Assistance from GPT-4V did not help human raters. GPT-4V relied on the textual context for its outputs. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Katz in this issue.
Databáze: MEDLINE