Autor:	Jongeneel, Wouter, Kuhn, Daniel, Li, Mengmeng
Rok vydání:	2023
Předmět:	Mathematics - Optimization and Control Statistics - Machine Learning 60F10 90C26
Druh dokumentu:	Working Paper
Popis:	Motivated by policy gradient methods in the context of reinforcement learning, we identify a large deviation rate function for the iterates generated by stochastic gradient descent for possibly non-convex objectives satisfying a Polyak-{\L}ojasiewicz condition. Leveraging the contraction principle from large deviations theory, we illustrate the potential of this result by showing how convergence properties of policy gradient with a softmax parametrization and an entropy regularized objective can be naturally extended to a wide spectrum of other policy parametrizations. Comment: v3; comments are welcome
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2311.07411 Zobrazit plný text záznamu View this record from Arxiv