Abstrakt: |
Objective: This study investigates the accuracy of large language models (LLMs) on magnetic resonance imaging (MRI) safety-related questions. Methods: Three experienced radiologists independently prepared 20 multiple-choice questions based on the MRI safety guidelines published by the Turkish Magnetic Resonance Society. An initial prompt was entered into 4 diferent LLMs (ChatGPT-3.5, ChatGPT-4, Gemini, and Perplexity) and then a total of 60 questions were asked. The answers received were compared with the answers assigned by the radiologists according to guidelines. The performance of each model was obtained as accuracy. Results: In 60 questions, the accuracy rates were 78.3% (47/60) for ChatGPT-3.5, 93.3% (56/60) for ChatGPT-4, 88.3% (53/60) for Gemini, and 86.7% (52/60) for Perplexity. In addition, ChatGPT-3.5 answered 19/20, 13/20, and 15/20, ChatGPT-4 answered 18/20, 18/20, and 20/20, Gemini answered 19/20, 18/20, and 16/20, and Perplexity answered 20/20, 15/20, and 17/20 correctly to question groups prepared by 3 radiologists, respectively. Conclusion: Large language models, particularly the most stable and highest performing ChatGPT-4, may be useful to patients and health-care professionals in providing MRI safety-related information. They have the potential to assist in the future to protect health-care professionals and patients from MRI-related accidents. [ABSTRACT FROM AUTHOR] |