Online vs Offline: A Comparative Study of First-Party and Third-Party Evaluations of Social Chatbots

Autor:	Svikhnushina, Ekaterina, Pu, Pearl
Rok vydání:	2024
Předmět:	Computer Science - Human-Computer Interaction Computer Science - Computation and Language
Druh dokumentu:	Working Paper
Popis:	This paper explores the efficacy of online versus offline evaluation methods in assessing conversational chatbots, specifically comparing first-party direct interactions with third-party observational assessments. By extending a benchmarking dataset of user dialogs with empathetic chatbots with offline third-party evaluations, we present a systematic comparison between the feedback from online interactions and the more detached offline third-party evaluations. Our results reveal that offline human evaluations fail to capture the subtleties of human-chatbot interactions as effectively as online assessments. In comparison, automated third-party evaluations using a GPT-4 model offer a better approximation of first-party human judgments given detailed instructions. This study highlights the limitations of third-party evaluations in grasping the complexities of user experiences and advocates for the integration of direct interaction feedback in conversational AI evaluation to enhance system development and user satisfaction.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2409.07823 Zobrazit plný text záznamu View this record from Arxiv