Performance of Large Language Models in Patient Complaint Resolution: Web-Based Cross-Sectional Survey.

Autor: Yong LPX; Emergency Medicine Department, National University Hospital, National University Health System, Singapore, Singapore.; Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Urgent Care Centre, Alexandra Hospital, National University Health System, Singapore, Singapore., Tung JYM; Department of Urology, Singapore General Hospital, Singapore, Singapore., Lee ZY; Emergency Medicine Department, National University Hospital, National University Health System, Singapore, Singapore.; Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Urgent Care Centre, Alexandra Hospital, National University Health System, Singapore, Singapore., Kuan WS; Emergency Medicine Department, National University Hospital, National University Health System, Singapore, Singapore.; Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Urgent Care Centre, Alexandra Hospital, National University Health System, Singapore, Singapore., Chua MT; Emergency Medicine Department, National University Hospital, National University Health System, Singapore, Singapore.; Department of Surgery, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.; Urgent Care Centre, Alexandra Hospital, National University Health System, Singapore, Singapore.
Jazyk: angličtina
Zdroj: Journal of medical Internet research [J Med Internet Res] 2024 Aug 09; Vol. 26, pp. e56413. Date of Electronic Publication: 2024 Aug 09.
DOI: 10.2196/56413
Abstrakt: Background: Patient complaints are a perennial challenge faced by health care institutions globally, requiring extensive time and effort from health care workers. Despite these efforts, patient dissatisfaction remains high. Recent studies on the use of large language models (LLMs) such as the GPT models developed by OpenAI in the health care sector have shown great promise, with the ability to provide more detailed and empathetic responses as compared to physicians. LLMs could potentially be used in responding to patient complaints to improve patient satisfaction and complaint response time.
Objective: This study aims to evaluate the performance of LLMs in addressing patient complaints received by a tertiary health care institution, with the goal of enhancing patient satisfaction.
Methods: Anonymized patient complaint emails and associated responses from the patient relations department were obtained. ChatGPT-4.0 (OpenAI, Inc) was provided with the same complaint email and tasked to generate a response. The complaints and the respective responses were uploaded onto a web-based questionnaire. Respondents were asked to rate both responses on a 10-point Likert scale for 4 items: appropriateness, completeness, empathy, and satisfaction. Participants were also asked to choose a preferred response at the end of each scenario.
Results: There was a total of 188 respondents, of which 115 (61.2%) were health care workers. A majority of the respondents, including both health care and non-health care workers, preferred replies from ChatGPT (n=164, 87.2% to n=183, 97.3%). GPT-4.0 responses were rated higher in all 4 assessed items with all median scores of 8 (IQR 7-9) compared to human responses (appropriateness 5, IQR 3-7; empathy 4, IQR 3-6; quality 5, IQR 3-6; satisfaction 5, IQR 3-6; P<.001) and had higher average word counts as compared to human responses (238 vs 76 words). Regression analyses showed that a higher word count was a statistically significant predictor of higher score in all 4 items, with every 1-word increment resulting in an increase in scores of between 0.015 and 0.019 (all P<.001). However, on subgroup analysis by authorship, this only held true for responses written by patient relations department staff and not those generated by ChatGPT which received consistently high scores irrespective of response length.
Conclusions: This study provides significant evidence supporting the effectiveness of LLMs in resolution of patient complaints. ChatGPT demonstrated superiority in terms of response appropriateness, empathy, quality, and overall satisfaction when compared against actual human responses to patient complaints. Future research can be done to measure the degree of improvement that artificial intelligence generated responses can bring in terms of time savings, cost-effectiveness, patient satisfaction, and stress reduction for the health care system.
(©Lorraine Pei Xian Yong, Joshua Yi Min Tung, Zi Yao Lee, Win Sen Kuan, Mui Teng Chua. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 09.08.2024.)
Databáze: MEDLINE