Balancing Policy Improvement and Evaluation in Risk-Sensitive Satisficing Algorithm

Autor:	Hiroaki Wakabayashi, Tatsuji Takahashi, Takumi Kamiya
Rok vydání:	2021
Předmět:	Computer science Lag Reinforcement learning Satisficing Aspiration level Space (commercial competition) Risk sensitive Lambda Algorithm TRACE (psycholinguistics)
Zdroj:	Advances in Intelligent Systems and Computing ISBN: 9783030731120
DOI:	10.1007/978-3-030-73113-7_16
Popis:	Reducing the search space is one of the challenges in reinforcement learning. One of the satisficing reinforcement learning algorithms, commonly known as RS+GRC, reduces large search space by setting an aspiration level. However, a lag between policy evaluation and improvement, due to policy feedback, prevents proper exploration. Therefore, we propose an eligibility trace-based RS(\(\lambda \)) method, which eliminated the lag. We demonstrated that RS(\(\lambda \)) exhibited efficient learning toward behavior policy-based satisfaction.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::b408a6faa42330f584fa55806ebe0c01 https://doi.org/10.1007/978-3-030-73113-7_16 Zobrazit plný text záznamu