Free energy based policy gradients.

Autor: Theodorou, Evangelos A., Najemnik, Jiri, Todorov, Emo
Zdroj: 2013 IEEE Symposium on Adaptive Dynamic Programming & Reinforcement Learning (ADPRL); 2013, p124-131, 8p
Abstrakt: Despite the plethora of reinforcement learning algorithms in machine learning and control, the majority of the work in this area relies on discrete time formulations of stochastic dynamics. In this work we present a new policy gradient algorithm for reinforcement learning in continuous state action spaces and continuous time for free energy-like cost functions. The derivation is based on successive application of Girsanov's theorem and the use of the Radon Nikodým derivative as formulated for Markov diffusion processes. The resulting policy gradient is reward weighted. The use of Radon Nikodým extends analysis and results to more general models of stochasticity in which jump diffusions processes are considered. We apply the resulting algorithm in two simple examples for learning attractor landscapes in rhythmic and discrete movements. [ABSTRACT FROM PUBLISHER]
Databáze: Complementary Index