A Replaceable Curiosity-Driven Candidate Agent Exploration Approach for Task-Oriented Dialog Policy Learning

Autor:	Xuecheng Niu, Akinori Ito, Takashi Nose
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Dialog management reinforcement learning deep Dyna-Q curiosity multi-agent optimization Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 142640-142650 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3462719
Popis:	Task-oriented dialog policy learning is often formulated as a Reinforcement Learning problem whose rewards from the environment are extremely sparse, which means that the agent will often not find the reward by acting randomly. Thus, exploration techniques are of primary importance when solving RL problems, and more sophisticated exploration methods must be devised. In this study, we propose a replaceable curiosity-driven candidate agent exploration approach to encourage the agent to balance action sampling and explore new environments without overly violating dialog strategies. In this framework, we follow the employment of the curiosity model but design weight for the curiosity reward to balance exploration and exploitation. We designed a multi-candidate agent mechanism to filter an agent with relatively balanced action sampling for formal dialog training to motivate agents to escape pseudo-optimal actions in the early training stage. In addition, we propose a replacement mechanism for the first time to prevent the elected agents from performing poorly in the later stages of training and to fully utilize all the candidate agents. The experimental results show that the adjustable curiosity reward promotes dialog policy convergence. The agent replacement mechanism effectively blocks the training of poorly trained agents, significantly increasing the task’s average success rate and reducing the number of dialog turns. In this research, an exploration approach for task-oriented dialog system is designed to encourage agents to explore environment through balanced action sampling, without significantly deviating from learned dialog strategies. Compared to baselines, the replaceable curiosity-driven candidate agent exploration approach yields a higher average success rate of 0.714 and a lower number of average turns of 20.6.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/c5d2d6d8d96549f6bc9902efa72d1e0b Zobrazit plný text záznamu View record in DOAJ