Popis: |
Reducing the search space is one of the challenges in reinforcement learning. One of the satisficing reinforcement learning algorithms, commonly known as RS+GRC, reduces large search space by setting an aspiration level. However, a lag between policy evaluation and improvement, due to policy feedback, prevents proper exploration. Therefore, we propose an eligibility trace-based RS(\(\lambda \)) method, which eliminated the lag. We demonstrated that RS(\(\lambda \)) exhibited efficient learning toward behavior policy-based satisfaction. |