Improving cognitive agent decision making: Experience trajectories as plans

Autor:	Liz Sonenberg, Michael Kirley, Samin Karim, Jens Pfau
Rok vydání:	2014
Předmět:	Computer Networks and Communications Computer science Management science Conflation Reuse Task (project management) Action (philosophy) Artificial Intelligence Human–computer interaction Reinforcement learning Abstraction State (computer science) Reinforcement Software
Zdroj:	Web Intelligence and Agent Systems: An International Journal. 12:267-287
ISSN:	1570-1263
DOI:	10.3233/wia-140296
Popis:	In task environments with large state and action spaces, the use of temporal and state abstraction can potentially improve the decision making performance of agents. However, existing approaches within a reinforcement learning framework typically identify possible subgoal states and instantly learn stochastic subpolicies to reach them from other states. In these circumstances, exploration of the reinforcement learner is unfavorably biased towards local behavior around these subgoals; temporal abstractions are not exploited to reduce required deliberation; and the benefit of employing temporal abstractions is conflated with the benefit of additional learning done to define subpolicies. In this paper, we consider a cognitive agent architecture that allows for the extraction and reuse of temporal abstractions in the form of experience trajectories from a bottom-level reinforcement learning module and a top-level module based on the BDI (Belief-Desire-Intention) model. Here, the reuse of trajectories depends on the situation in which their recording was started. We investigate the efficacy of our approach using two well-known domains – the pursuit and the taxi domains. Detailed simulation experiments demonstrate that the use of experience trajectories as plans acquired at runtime can reduce the amount of decision making without significantly affecting asymptotic performance. The combination of temporal and state abstraction leads to improved performance during the initial learning of the reinforcement learner. Our approach can significantly reduce the number of deliberations required.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::43ad7f2c2226eddb146b672663a86ea4 https://doi.org/10.3233/wia-140296 Zobrazit plný text záznamu Plný text ve formátu PDF