A Markov chain Monte Carlo algorithm for Bayesian policy search

Autor:	Ahmet Onat, Sinan Yildirim, Vahid Tavakol Aghaei
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	Mathematical optimization Control and Optimization Computer science Bayesian probability lcsh:Control engineering systems. Automatic machinery (General) 02 engineering and technology 01 natural sciences lcsh:TA168 lcsh:TJ212-225 010104 statistics & probability symbols.namesake Local optimum Artificial Intelligence Search algorithm Reinforcement learning 0202 electrical engineering electronic engineering information engineering particle filtering QA Mathematics 0101 mathematics Markov chain Monte Carlo Discrete time and continuous time policy search lcsh:Systems engineering Control and Systems Engineering symbols risk sensitive reward 020201 artificial intelligence & image processing Markov decision process Particle filter control
Zdroj:	Systems Science & Control Engineering, Vol 6, Iss 1, Pp 438-455 (2018)
DOI:	10.1080/21642583.2018.1528483
Popis:	Policy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima complications. In this paper, we take a Bayesian approach to policy search under RL paradigm, for the problem of controlling a discrete time Markov decision process with continuous state and action spaces and with a multiplicative reward structure. For this purpose, we assume a prior over policy parameters and aim for the ‘posterior’ distribution where the ‘likelihood’ is the expected reward. We propound a Markov chain Monte Carlo (MCMC) algorithm as a method of generating samples for policy parameters from this posterior. The proposed algorithm is compared with certain well-known policy gradient based RL methods and exhibits more appropriate performance in terms of time response and convergence rate, when applied to a nonlinear model of a Cart-Pole benchmark.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::23796e4edac3a0742105aa2b6f9a01d1 https://doi.org/10.1080/21642583.2018.1528483 Zobrazit plný text záznamu