Popis: |
Abstract Background Since it was first launched, ChatGPT, a Large Language Model (LLM), has been widely used across different disciplines, particularly the medical field. Objective The main aim of this review is to thoroughly assess the performance of the distinct version of ChatGPT in subspecialty written medical proficiency exams and the factors that impact it. Methods Distinct online databases were searched for appropriate articles that fit the intended objectives of the study: PubMed, CINAHL, and Web of Science. A group of reviewers was assembled to create an appropriate methodology framework for the articles to be included. Results 16 articles were adopted for this review that assessed the performance of different ChatGPT versions across different subspecialty written examinations, such as surgery, neurology, orthopedics, trauma and orthopedics, core cardiology, family medicine, and dermatology. The studies reported different passing grades and rankings with distinct accuracy rates, ranging from 35.8% to 91%, across different datasets and subspecialties. Some of the factors that were highlighted as impacting its correctness were the following: (1) ChatGPT distinct versions; (2) medical subspecialties; (3) types of questions; (4) language; and (5) comparators. Conclusions This review indicates ChatGPT’s performance on the different medical specialty examinations and poses potential research to investigate whether ChatGPT can enhance the learning and support medical students taking a range of medical specialty exams. However, to avoid exploitation and any detrimental effects on the real world of medicine, it is crucial to be aware of its limitations and improve the ongoing evaluation of this AI tool. |