Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions-an observational study.

Autor: Vaishya R; Department of Orthopaedics, Indraprastha Apollo Hospitals, Sarita Vihar, New Delhi, 110076, India. raju.vaishya@gmail.com., Iyengar KP; Department of Orthopaedics, Southport and Ormskirk Hospital, Mersey West Lancashire Teaching NHS Trust, Southport, UK., Patralekh MK; Department of Orthopaedics, Safdarjung Hospital, New Delhi, India., Botchu R; Department of Musculoskeletal Radiology, Royal Orthopedic Hospital, Birmingham, UK., Shirodkar K; Department of Musculoskeletal Radiology, Royal Orthopedic Hospital, Birmingham, UK., Jain VK; Department of Orthopaedics, RML Hospital, New Delhi, India., Vaish A; Department of Orthopaedics, Indraprastha Apollo Hospitals, Sarita Vihar, New Delhi, 110076, India., Scarlat MM; Clinique Chirurgicale St Michel, Groupe ELSAN Toulon, France.
Jazyk: angličtina
Zdroj: International orthopaedics [Int Orthop] 2024 Aug; Vol. 48 (8), pp. 1963-1969. Date of Electronic Publication: 2024 Apr 15.
DOI: 10.1007/s00264-024-06182-9
Abstrakt: Purpose: This study analyses the performance and proficiency of the three Artificial Intelligence (AI) generative chatbots (ChatGPT-3.5, ChatGPT-4.0, Bard Google AI®) and in answering the Multiple Choice Questions (MCQs) of postgraduate (PG) level orthopaedic qualifying examinations.
Methods: A series of 120 mock Single Best Answer' (SBA) MCQs with four possible options named A, B, C and D as answers on various musculoskeletal (MSK) conditions covering Trauma and Orthopaedic curricula were compiled. A standardised text prompt was used to generate and feed ChatGPT (both 3.5 and 4.0 versions) and Google Bard programs, which were then statistically analysed.
Results: Significant differences were found between responses from Chat GPT 3.5 with Chat GPT 4.0 (Chi square = 27.2, P < 0.001) and on comparing both Chat GPT 3.5 (Chi square = 63.852, P < 0.001) with Chat GPT 4.0 (Chi square = 44.246, P < 0.001) with. Bard Google AI® had 100% efficiency and was significantly more efficient than both Chat GPT 3.5 with Chat GPT 4.0 (p < 0.0001).
Conclusion: The results demonstrate the variable potential of the different AI generative chatbots (Chat GPT 3.5, Chat GPT 4.0 and Bard Google) in their ability to answer the MCQ of PG-level orthopaedic qualifying examinations. Bard Google AI® has shown superior performance than both ChatGPT versions, underlining the potential of such large language processing models in processing and applying orthopaedic subspecialty knowledge at a PG level.
(© 2024. The Author(s) under exclusive licence to SICOT aisbl.)
Databáze: MEDLINE