Context-sensitive reward shaping for sparse interaction multi-agent systems

Autor:	Daniel Kudenko, Sam Devlin, Ann Nowé, Yann-Michaël De Hauwere
Rok vydání:	2016
Předmět:	0209 industrial biotechnology Computer science business.industry Multi-agent system Distributed computing Context (language use) 02 engineering and technology Air traffic control Machine learning computer.software_genre Range (mathematics) 020901 industrial engineering & automation Artificial Intelligence Convergence (routing) 0202 electrical engineering electronic engineering information engineering State space Reinforcement learning A priori and a posteriori 020201 artificial intelligence & image processing Artificial intelligence business computer Software
Zdroj:	The Knowledge Engineering Review. 31:59-76
ISSN:	1469-8005 0269-8889
DOI:	10.1017/s0269888915000193
Popis:	Potential-based reward shaping is a commonly used approach in reinforcement learning to direct exploration based on prior knowledge. Both in single and multi-agent settings this technique speeds up learning without losing any theoretical convergence guarantees. However, if speed ups through reward shaping are to be achieved in multi-agent environments, a different shaping signal should be used for each context in which agents have a different subgoal or when agents are involved in a different interaction situation.This paper describes the use of context-aware potential functions in a multi-agent system in which the interactions between agents are sparse. This means that, unknown to the agentsa priori, the interactions between the agents only occur sporadically in certain regions of the state space. During these interactions, agents need to coordinate in order to reach the global optimal solution.We demonstrate how different reward shaping functions can be used on top of Future Coordinating Q-learning (FCQ-learning); an algorithm capable of automatically detecting when agents should take each other into consideration. Using FCQ-learning, coordination problems can even be anticipated before the actual problems occur, allowing the problems to be solved timely. We evaluate our approach on a range of gridworld problems, as well as a simulation of air traffic control.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::c3242dd085a7fdd9c45e49ed19b2dc8a https://doi.org/10.1017/s0269888915000193 Zobrazit plný text záznamu