Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Roostaie, Sahar"'
Trust Region Policy Optimization (TRPO) is a popular and empirically successful policy search algorithm in reinforcement learning (RL). It iteratively solved the surrogate problem which restricts consecutive policies to be close to each other. TRPO i
Externí odkaz:
http://arxiv.org/abs/2110.13373