Performance of ChatGPT and Radiology Residents on Ultrasonography Board-Style Questions

Autor: Jiale Xu, MD, Shujun Xia, MD, Qing Hua, MD, Zihan Mei, MD, Yiqing Hou, MD, Minyan Wei, MD, Limei Lai, MD, Yixuan Yang, MD, Jianqiao Zhou, MD
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Advanced Ultrasound in Diagnosis and Therapy, Vol 8, Iss 4, Pp 250-254 (2024)
Druh dokumentu: article
ISSN: 2576-2516
DOI: 10.37015/AUDT.2024.240002
Popis: Objective: This study aims to assess the performance of the Chat Generative Pre-Trained Transformer (ChatGPT), specifically versions GPT-3.5 and GPT-4, on ultrasonography board-style questions, and subsequently compare it with the performance of third-year radiology residents on the identical set of questions. Methods: The study, conducted from May 19 to May 30, 2023, utilized a selection of 134 multiple-choice questions sourced from a commercial question bank for American Registry for Diagnostic Medical Sonography (ARDMS) examinations and imported into the ChatGPT model (encompassing GPT-3.5 and GPT-4 versions). ChatGPT’s responses were evaluated overall, by topic, and by GPT version. An identical question set was assigned to three third-year radiology residents, enabling a direct comparison of performances with ChatGPT. Results: GPT-4 correctly responded to 82.1% of questions (110 of 134), significantly surpassing the performance of GPT-3.5 (P = 0.003), which correctly answered 66.4% of questions (89 of 134). Although GPT-3.5’s performance was statistically indistinguishable from the average performance of the radiology residents (66.7%, 89.3 of 134) (P = 0.969), there was a notable difference in the accuracy in question-answering accuracy between GPT-4 and the residents (P = 0.004). Conclusions: ChatGPT demonstrated significant competency in responding to ultrasonography board-style questions, with the GPT-4 version markedly surpassing both its predecessor GPT-3.5 and the radiology residents.
Databáze: Directory of Open Access Journals