Popular large language model chatbots' accuracy, comprehensiveness, and self-awareness in answering ocular symptom queries.

Autor: Pushpanathan K; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore., Lim ZW; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore., Er Yew SM; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore., Chen DZ; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Department of Ophthalmology, National University Hospital, Singapore, Singapore., Hui'En Lin HA; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Department of Ophthalmology, National University Hospital, Singapore, Singapore., Lin Goh JH; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore., Wong WM; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Department of Ophthalmology, National University Hospital, Singapore, Singapore., Wang X; Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, Beijing, China.; Advanced Innovation Centre for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University, Beijing, China., Jin Tan MC; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Department of Ophthalmology, National University Hospital, Singapore, Singapore., Chang Koh VT; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Department of Ophthalmology, National University Hospital, Singapore, Singapore., Tham YC; Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Centre for Innovation and Precision Eye Health & Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore.; Ophthalmology and Visual Sciences Academic Clinical Programme (Eye ACP), Duke NUS Medical School, Singapore, Singapore.
Jazyk: angličtina
Zdroj: IScience [iScience] 2023 Oct 10; Vol. 26 (11), pp. 108163. Date of Electronic Publication: 2023 Oct 10 (Print Publication: 2023).
DOI: 10.1016/j.isci.2023.108163
Abstrakt: In light of growing interest in using emerging large language models (LLMs) for self-diagnosis, we systematically assessed the performance of ChatGPT-3.5, ChatGPT-4.0, and Google Bard in delivering proficient responses to 37 common inquiries regarding ocular symptoms. Responses were masked, randomly shuffled, and then graded by three consultant-level ophthalmologists for accuracy (poor, borderline, good) and comprehensiveness. Additionally, we evaluated the self-awareness capabilities (ability to self-check and self-correct) of the LLM-Chatbots. 89.2% of ChatGPT-4.0 responses were 'good'-rated, outperforming ChatGPT-3.5 (59.5%) and Google Bard (40.5%) significantly (all p < 0.001). All three LLM-Chatbots showed optimal mean comprehensiveness scores as well (ranging from 4.6 to 4.7 out of 5). However, they exhibited subpar to moderate self-awareness capabilities. Our study underscores the potential of ChatGPT-4.0 in delivering accurate and comprehensive responses to ocular symptom inquiries. Future rigorous validation of their performance is crucial to ensure their reliability and appropriateness for actual clinical use.
Competing Interests: All authors declare no competing interests.
(© 2023 The Authors.)
Databáze: MEDLINE