The Impact of Preference Agreement in Reinforcement Learning from Human Feedback: A Case Study in Summarization

Autor:	Gooding, Sian, Mansoor, Hassan
Rok vydání:	2023
Předmět:	Computer Science - Computation and Language Computer Science - Artificial Intelligence Computer Science - Human-Computer Interaction
Druh dokumentu:	Working Paper
Popis:	Reinforcement Learning from Human Feedback (RLHF) can be used to capture complex and nuanced properties of text generation quality. As a result, the task of text summarization has been identified as a good candidate for this process. In this paper, we explore how preference agreement impacts the efficacy of RLHF for summarization. We show that sampling human preferences to include a range of annotator agreement results in (1) higher accuracy reward models and (2) alters the characteristics of quality captured. We additionally show improvements in downstream generation when using a reward model trained with a range of preference agreements. Our contributions have implications for the design of synthetic datasets as well as the importance of considering quality differentials in comparison-based data.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2311.04919 Zobrazit plný text záznamu View this record from Arxiv