Autor:	Kose, Umit, Ruszczynski, Andrzej
Rok vydání:	2020
Předmět:	Mathematics - Optimization and Control Computer Science - Machine Learning 49L20 62L20 90C39
Druh dokumentu:	Working Paper
Popis:	We consider reinforcement learning with performance evaluated by a dynamic risk measure. We construct a projected risk-averse dynamic programming equation and study its properties. Then we propose risk-averse counterparts of the methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2003.00780 Zobrazit plný text záznamu View this record from Arxiv