SocialBench: Sociality Evaluation of Role-Playing Conversational Agents

Autor:	Chen, Hongzhan, Chen, Hehong, Yan, Ming, Xu, Wenshen, Gao, Xing, Shen, Weizhou, Quan, Xiaojun, Li, Chenliang, Zhang, Ji, Huang, Fei, Zhou, Jingren
Rok vydání:	2024
Předmět:	Computer Science - Computation and Language
Druh dokumentu:	Working Paper
Popis:	Large language models (LLMs) have advanced the development of various AI conversational agents, including role-playing conversational agents that mimic diverse characters and human behaviors. While prior research has predominantly focused on enhancing the conversational capability, role-specific knowledge, and stylistic attributes of these agents, there has been a noticeable gap in assessing their social intelligence. In this paper, we introduce SocialBench, the first benchmark designed to systematically evaluate the sociality of role-playing conversational agents at both individual and group levels of social interactions. The benchmark is constructed from a variety of sources and covers a wide range of 500 characters and over 6,000 question prompts and 30,800 multi-turn role-playing utterances. We conduct comprehensive evaluations on this benchmark using mainstream open-source and closed-source LLMs. We find that agents excelling in individual level does not imply their proficiency in group level. Moreover, the behavior of individuals may drift as a result of the influence exerted by other agents within the group. Experimental results on SocialBench confirm its significance as a testbed for assessing the social interaction of role-playing conversational agents. The benchmark is publicly accessible at https://github.com/X-PLUG/SocialBench. Comment: ACL 2024 Findings
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2403.13679 Zobrazit plný text záznamu View this record from Arxiv