Large Language Models with Vision on Diagnostic Radiology Board Exam Style Questions.
Autor: | Sun SH; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA. Electronic address: shawnsun25@gmail.com., Chen K; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA., Anavim S; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA., Phillipi M; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA., Yeh L; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA., Huynh K; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA., Cortes G; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA., Tran J; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA., Tran M; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA., Yaghmai V; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA., Houshyar R; University of California Irvine, Radiology Department, UCI Medical Center, Orange, California, USA. |
---|---|
Jazyk: | angličtina |
Zdroj: | Academic radiology [Acad Radiol] 2024 Dec 03. Date of Electronic Publication: 2024 Dec 03. |
DOI: | 10.1016/j.acra.2024.11.028 |
Abstrakt: | Rationale and Objectives: The expansion of large language models to process images offers new avenues for application in radiology. This study aims to assess the multimodal capabilities of contemporary large language models, which allow analysis of image inputs in addition to textual data, on radiology board-style examination questions with images. Materials and Methods: 280 questions were retrospectively selected from the AuntMinnie public test bank. The test questions were converted into three formats of prompts; (1) Multimodal, (2) Image-only, and (3) Text-only input. Three models, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet, were evaluated using these prompts. The Cochran Q test and pairwise McNemar test were used to compare performances between prompt formats and models. Results: No difference was found for the performance in terms of % correct answers between the text, image, and multimodal prompt formats for GPT-4V (54%, 52%, and 57%, respectively; p = .31) and Gemini 1.5 Pro (53%, 54%, and 57%, respectively; p = .53). For Claude 3.5 Sonnet, the image input (48%) significantly underperformed compared to the text input (63%, p < .001) and the multimodal input (66%, p < .001), but no difference was found between the text and multimodal inputs (p = .29). Claude significantly outperformed GPT and Gemini in the text and multimodal formats (p < .01). Conclusion: Vision-capable large language models cannot effectively use images to increase performance on radiology board-style examination questions. When using textual data alone, Claude 3.5 Sonnet outperforms GPT-4V and Gemini 1.5 Pro, highlighting the advancements in the field and its potential for use in further research. Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. (Copyright © 2024 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.) |
Databáze: | MEDLINE |
Externí odkaz: |