Model-Based Policy Iterations for Nonlinear Systems via Controlled Hamiltonian Dynamics
Autor: | Mario Sassano, Thulasi Mylvaganam, Alessandro Astolfi |
---|---|
Rok vydání: | 2023 |
Předmět: | |
Zdroj: | IEEE Transactions on Automatic Control. 68:2683-2698 |
ISSN: | 2334-3303 0018-9286 |
Popis: | The infinite-horizon optimal control problem for nonlinear systems is studied. In the context of model-based, iterative learning strategies we propose an alternative definition and construction of the temporal difference error arising in Policy Iteration strategies. In such architectures the error is computed via the evolution of the Hamiltonian function (or, possibly, of its integral) along the trajectories of the closed-loop system. Herein the temporal difference error is instead obtained via two subsequent steps: first the dynamics of the underlying costate variable in the Hamiltonian system is steered by means of a (virtual) control input in such a way that the stable invariant manifold becomes externally attractive. Then, the distance-from-invariance of the manifold, induced by approximate solutions, yields a natural candidate measure for the policy evaluation step. The policy improvement phase is then performed by means of standard gradient descent methods that allows to correctly update the weights of the underlying functional approximator. The above architecture then yields an iterative (episodic) learning scheme based on a scalar, constant reward at each iteration, the value of which is insensitive to the length of the episode, as in the original spirit of Reinforcement Learning strategies for discrete-time systems. Finally, the theory is validated by means of a numerical simulation involving an automatic flight control problem. |
Databáze: | OpenAIRE |
Externí odkaz: |