ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources.
Autor: | Tao BK; Faculty of Medicine, The University of British Columbia, 317-2194 Health Sciences Mall, Vancouver, BC, V6T 1Z3, Canada., Hua N; Temerty Faculty of Medicine, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada., Milkovich J; Temerty Faculty of Medicine, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada., Micieli JA; Temerty Faculty of Medicine, University of Toronto, 1 King's College Circle, Toronto, ON, M5S 1A8, Canada. jonathanmicieli@gmail.com.; Department of Ophthalmology and Vision Sciences, University of Toronto, 340 College Street, Toronto, ON, M5T 3A9, Canada. jonathanmicieli@gmail.com.; Division of Neurology, Department of Medicine, University of Toronto, 6 Queen's Park Crescent West, Toronto, ON, M5S 3H2, Canada. jonathanmicieli@gmail.com.; Kensington Vision and Research Center, 340 College Street, Toronto, ON, M5T 3A9, Canada. jonathanmicieli@gmail.com.; St. Michael's Hospital, 36 Queen Street East, Toronto, ON, M5B 1W8, Canada. jonathanmicieli@gmail.com.; Toronto Western Hospital, 399 Bathurst Street, Toronto, ON, M5T 2S8, Canada. jonathanmicieli@gmail.com.; University Health Network, 190 Elizabeth Street, Toronto, ON, M5G 2C4, Canada. jonathanmicieli@gmail.com. |
---|---|
Jazyk: | angličtina |
Zdroj: | Eye (London, England) [Eye (Lond)] 2024 Jul; Vol. 38 (10), pp. 1897-1902. Date of Electronic Publication: 2024 Mar 20. |
DOI: | 10.1038/s41433-024-03037-w |
Abstrakt: | Background/objectives: Experimental investigation. Bing Chat (Microsoft) integration with ChatGPT-4 (OpenAI) integration has conferred the capability of accessing online data past 2021. We investigate its performance against ChatGPT-3.5 on a multiple-choice question ophthalmology exam. Subjects/methods: In August 2023, ChatGPT-3.5 and Bing Chat were evaluated against 913 questions derived from the Academy's Basic and Clinical Science Collection collection. For each response, the sub-topic, performance, Simple Measure of Gobbledygook readability score (measuring years of required education to understand a given passage), and cited resources were collected. The primary outcomes were the comparative scores between models, and qualitatively, the resources referenced by Bing Chat. Secondary outcomes included performance stratified by response readability, question type (explicit or situational), and BCSC sub-topic. Results: Across 913 questions, ChatGPT-3.5 scored 59.69% [95% CI 56.45,62.94] while Bing Chat scored 73.60% [95% CI 70.69,76.52]. Both models performed significantly better in explicit than clinical reasoning questions. Both models performed best on general medicine questions than ophthalmology subsections. Bing Chat referenced 927 online entities and provided at-least one citation to 836 of the 913 questions. The use of more reliable (peer-reviewed) sources was associated with higher likelihood of correct response. The most-cited resources were eyewiki.aao.org, aao.org, wikipedia.org, and ncbi.nlm.nih.gov. Bing Chat showed significantly better readability than ChatGPT-3.5, averaging a reading level of grade 11.4 [95% CI 7.14, 15.7] versus 12.4 [95% CI 8.77, 16.1], respectively (p-value < 0.0001, ρ = 0.25). Conclusions: The online access, improved readability, and citation feature of Bing Chat confers additional utility for ophthalmology learners. We recommend critical appraisal of cited sources during response interpretation. (© 2024. The Author(s), under exclusive licence to The Royal College of Ophthalmologists.) |
Databáze: | MEDLINE |
Externí odkaz: |