Zobrazeno 1 - 10
of 15 968
pro vyhledávání: '"Policy iteration"'
In optimal control problem, policy iteration (PI) is a powerful reinforcement learning (RL) tool used for designing optimal controller for the linear systems. However, the need for an initial stabilizing control policy significantly limits its applic
Externí odkaz:
http://arxiv.org/abs/2411.07825
We present a policy iteration algorithm for the infinite-horizon N-player general-sum deterministic linear quadratic dynamic games and compare it to policy gradient methods. We demonstrate that the proposed policy iteration algorithm is distinct from
Externí odkaz:
http://arxiv.org/abs/2410.03106
In this paper, we address Linear Quadratic Regulator (LQR) problems through a novel iterative algorithm named EXtremum-seeking Policy iteration LQR (EXP-LQR). The peculiarity of EXP-LQR is that it only needs access to a truncated approximation of the
Externí odkaz:
http://arxiv.org/abs/2412.02758
This paper revisits and extends the convergence and robustness properties of value and policy iteration algorithms for discrete-time linear quadratic regulator problems. In the model-based case, we extend current results concerning the region of expo
Externí odkaz:
http://arxiv.org/abs/2411.04548
Simulation-Based Optimistic Policy Iteration For Multi-Agent MDPs with Kullback-Leibler Control Cost
This paper proposes an agent-based optimistic policy iteration (OPI) scheme for learning stationary optimal stochastic policies in multi-agent Markov Decision Processes (MDPs), in which agents incur a Kullback-Leibler (KL) divergence cost for their c
Externí odkaz:
http://arxiv.org/abs/2410.15156
We propose a policy iteration method to solve an inverse problem for a mean-field game (MFG) model, specifically to reconstruct the obstacle function in the game from the partial observation data of value functions, which represent the optimal costs
Externí odkaz:
http://arxiv.org/abs/2409.06184
Autor:
Possamaï, Dylan, Tangpi, Ludovic
In this paper, we propose a new policy iteration algorithm to compute the value function and the optimal controls of continuous time stochastic control problems. The algorithm relies on successive approximations using linear-quadratic control problem
Externí odkaz:
http://arxiv.org/abs/2409.04037
In offline reinforcement learning (RL), it is necessary to manage out-of-distribution actions to prevent overestimation of value functions. Policy-regularized methods address this problem by constraining the target policy to stay close to the behavio
Externí odkaz:
http://arxiv.org/abs/2405.20555
Autor:
Meyer, Nico, Murauer, Jakob, Popov, Alexander, Ufrecht, Christian, Plinge, Axel, Mutschler, Christopher, Scherer, Daniel D.
Reinforcement learning is a powerful framework aiming to determine optimal behavior in highly complex decision-making scenarios. This objective can be achieved using policy iteration, which requires to solve a typically large linear system of equatio
Externí odkaz:
http://arxiv.org/abs/2404.10546
We consider inexact policy iteration methods for large-scale infinite-horizon discounted MDPs with finite spaces, a variant of policy iteration where the policy evaluation step is implemented inexactly using an iterative solver for linear systems. In
Externí odkaz:
http://arxiv.org/abs/2404.06136