Balancing Policy Improvement and Evaluation in Risk-Sensitive Satisficing Algorithm

Autor: Hiroaki Wakabayashi, Tatsuji Takahashi, Takumi Kamiya
Rok vydání: 2021
Předmět:
Zdroj: Advances in Intelligent Systems and Computing ISBN: 9783030731120
DOI: 10.1007/978-3-030-73113-7_16
Popis: Reducing the search space is one of the challenges in reinforcement learning. One of the satisficing reinforcement learning algorithms, commonly known as RS+GRC, reduces large search space by setting an aspiration level. However, a lag between policy evaluation and improvement, due to policy feedback, prevents proper exploration. Therefore, we propose an eligibility trace-based RS(\(\lambda \)) method, which eliminated the lag. We demonstrated that RS(\(\lambda \)) exhibited efficient learning toward behavior policy-based satisfaction.
Databáze: OpenAIRE