Transition Based Discount Factor for Model Free Algorithms in Reinforcement Learning

Autor:	Abhinav Sharma, Ruchir Gupta, K. Lakshmanan, Atul Gupta
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	model free algorithms Q-learning reinforcement learning SARSA transition dependent discount factor Mathematics QA1-939
Zdroj:	Symmetry, Vol 13, Iss 7, p 1197 (2021)
Druh dokumentu:	article
ISSN:	2073-8994
DOI:	10.3390/sym13071197
Popis:	Reinforcement Learning (RL) enables an agent to learn control policies for achieving its long-term goals. One key parameter of RL algorithms is a discount factor that scales down future cost in the state’s current value estimate. This study introduces and analyses a transition-based discount factor in two model-free reinforcement learning algorithms: Q-learning and SARSA, and shows their convergence using the theory of stochastic approximation for finite state and action spaces. This causes an asymmetric discounting, favouring some transitions over others, which allows (1) faster convergence than constant discount factor variant of these algorithms, which is demonstrated by experiments on the Taxi domain and MountainCar environments; (2) provides better control over the RL agents to learn risk-averse or risk-taking policy, as demonstrated in a Cliff Walking experiment.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/430d8c43ccfb4a0f82f2a0fafa88bdc6 Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.