Comparative study of different large language models and medical professionals of different levels responding to ophthalmology questions

Autor: Huang Hui, Hu Jinyu, Wang Xiaoyu, Ye Shuyuan, Wu Shinan, Chen Cheng, He Liangqi, Zeng Yanmei, Wei Hong, Shao Yi
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Guoji Yanke Zazhi, Vol 24, Iss 3, Pp 458-462 (2024)
Druh dokumentu: article
ISSN: 1672-5123
DOI: 10.3980/j.issn.1672-5123.2024.3.24
Popis: AIM: To evaluate the performance of three distinct large language models(LLM), including GPT-3.5, GPT-4, and PaLM2, in responding to queries within the field of ophthalmology, and to compare their performance with three different levels of medical professionals: medical undergraduates, master of medicine, and attending physicians.METHODS: A total of 100 ophthalmic multiple-choice tests, which covered ophthalmic basic knowledge, clinical knowledge, ophthalmic examination and diagnostic methods, and treatment for ocular disease, were conducted on three different kinds of LLM and three different levels of medical professionals(9 undergraduates, 6 postgraduates and 3 attending physicians), respectively. The performance of LLM was comprehensively evaluated from the aspects of mean scores, consistency and confidence of response, and it was compared with human.RESULTS: Notably, each LLM surpassed the average performance of undergraduate medical students(GPT-4:56, GPT-3.5:42, PaLM2:47, undergraduate students:40). Specifically, performance of GPT-3.5 and PaLM2 was slightly lower than those of master's students(51), while GPT-4 exhibited a performance comparable to attending physicians(62). Furthermore, GPT-4 showed significantly higher response consistency and self-confidence compared with GPT-3.5 and PaLM2.CONCLUSION: LLM represented by GPT-4 performs well in the field of ophthalmology, and the LLM model can provide clinical decision-making and teaching aids for clinicians and medical education.
Databáze: Directory of Open Access Journals