The scientific knowledge of three large language models in cardiology: multiple-choice questions examination-based performance.
Autor: | Altamimi I; College of Medicine.; Evidence-Based Health Care and Knowledge Translation Research Chair, Family and Community Medicine Department, College of Medicine, King Saud University., Alhumimidi A; College of Medicine., Alshehri S; College of Medicine., Alrumayan A; College of Medicine, King Saud Bin Abdulaziz University for Health and Sciences, Riyadh, Saudi Arabia., Al-Khlaiwi T; Department of Physiology., Meo SA; Department of Physiology., Temsah MH; College of Medicine.; Evidence-Based Health Care and Knowledge Translation Research Chair, Family and Community Medicine Department, College of Medicine, King Saud University.; Pediatric Intensive Care Unit, Pediatric Department, College of Medicine, King Saud University Medical City. |
---|---|
Jazyk: | angličtina |
Zdroj: | Annals of medicine and surgery (2012) [Ann Med Surg (Lond)] 2024 May 06; Vol. 86 (6), pp. 3261-3266. Date of Electronic Publication: 2024 May 06 (Print Publication: 2024). |
DOI: | 10.1097/MS9.0000000000002120 |
Abstrakt: | Background: The integration of artificial intelligence (AI) chatbots like Google's Bard, OpenAI's ChatGPT, and Microsoft's Bing Chatbot into academic and professional domains, including cardiology, has been rapidly evolving. Their application in educational and research frameworks, however, raises questions about their efficacy, particularly in specialized fields like cardiology. This study aims to evaluate the knowledge depth and accuracy of these AI chatbots in cardiology using a multiple-choice question (MCQ) format. Methods: The study was conducted as an exploratory, cross-sectional study in November 2023 on a bank of 100 MCQs covering various cardiology topics that was created from authoritative textbooks and question banks. These MCQs were then used to assess the knowledge level of Google's Bard, Microsoft Bing, and ChatGPT 4.0. Each question was entered manually into the chatbots, ensuring no memory retention bias. Results: The study found that ChatGPT 4.0 demonstrated the highest knowledge score in cardiology, with 87% accuracy, followed by Bing at 60% and Bard at 46%. The performance varied across different cardiology subtopics, with ChatGPT consistently outperforming the others. Notably, the study revealed significant differences in the proficiency of these chatbots in specific cardiology domains. Conclusion: This study highlights a spectrum of efficacy among AI chatbots in disseminating cardiology knowledge. ChatGPT 4.0 emerged as a potential auxiliary educational resource in cardiology, surpassing traditional learning methods in some aspects. However, the variability in performance among these AI systems underscores the need for cautious evaluation and continuous improvement, especially for chatbots like Bard, to ensure reliability and accuracy in medical knowledge dissemination. Competing Interests: The authors report no personal or financial conflict of interests to declare.Sponsorships or competing interests that may be relevant to content are disclosed at the end of this article. (Copyright © 2024 The Author(s). Published by Wolters Kluwer Health, Inc.) |
Databáze: | MEDLINE |
Externí odkaz: |