Evaluating Performance of ChatGPT on MKSAP Cardiology Board Review Questions.

Autor: Milutinovic S; Florida State University College of Medicine Internal Medicine Residency Program at Lee Health, Cape Coral, Florida, USA. Electronic address: stefan.milutinovic@leehealth.org., Petrovic M; Icahn School of Medicine at Mount Sinai, New York City, New York, USA., Begosh-Mayne D; Florida State University College of Medicine Internal Medicine Residency Program at Lee Health, Cape Coral, Florida, USA., Lopez-Mattei J; Lee Health Heart Institute, Fort Myers, Florida, USA., Chazal RA; Lee Health Heart Institute, Fort Myers, Florida, USA., Wood MJ; Lee Health Heart Institute, Fort Myers, Florida, USA., Escarcega RO; Florida State University College of Medicine Internal Medicine Residency Program at Lee Health, Cape Coral, Florida, USA; Lee Health Heart Institute, Fort Myers, Florida, USA; Florida Heart Associates, Fort Myers, Florida, USA.
Jazyk: angličtina
Zdroj: International journal of cardiology [Int J Cardiol] 2024 Sep 19; Vol. 417, pp. 132576. Date of Electronic Publication: 2024 Sep 19.
DOI: 10.1016/j.ijcard.2024.132576
Abstrakt: Chat Generative Pretrained Transformer (ChatGPT) is a natural language processing tool created by OpenAI. Much of the discussion regarding artificial intelligence (AI) in medicine is the ability of the language to enhance medical practice, improve efficiency and decrease errors. The objective of this study was to analyze the ability of ChatGPT to answer board-style cardiovascular medicine questions by using the Medical Knowledge Self-Assessment Program (MKSAP).The study evaluated the performance of ChatGPT (versions 3.5 and 4), alongside internal medicine residents and internal medicine and cardiology attendings, in answering 98 multiple-choice questions (MCQs) from the Cardiovascular Medicine Chapter of MKSAP. ChatGPT-4 demonstrated an accuracy of 74.5 %, comparable to internal medicine (IM) intern (63.3 %), senior resident (63.3 %), internal medicine attending physician (62.2 %), and ChatGPT-3.5 (64.3 %) but significantly lower than cardiology attending physician (85.7 %). Subcategory analysis revealed no statistical difference between ChatGPT and physicians, except in valvular heart disease where cardiology attending outperformed ChatGPT (p = 0.031) for version 3.5, and for heart failure (p = 0.046) where ChatGPT-4 outperformed senior resident. While ChatGPT shows promise in certain subcategories, in order to establish AI as a reliable educational tool for medical professionals, performance of ChatGPT will likely need to surpass the accuracy of instructors, ideally achieving the near-perfect score on posed questions.
Competing Interests: Declaration of competing interest The authors report no relationships that could be construed as a conflict of interest.
(Copyright © 2024 Elsevier B.V. All rights reserved.)
Databáze: MEDLINE