Popis: |
A major challenge in reinforcement learning (RL) is use of a tabular representation to represent learned policies with a large number of states or state-action pairs. Function approximation is a promising tool to overcome this deficiency. This approach uses parameterized functions instead of a table to represent learned knowledge and enables generalization. However, existing schemes cannot solve realistic RL problems, with their rapidly increasing demands for approximating accuracy and efficiency. In this paper, we extend the architecture of Sparse Distributed Memories (SDMs) and propose a novel on-line methodology, similarity-aware Kanerva coding (SAK), that closely represents the learned knowledge for very large-scale problems with significantly fewer parameterized components. SAK directly measures the state variables' real distances in all dimensions and reformulates a new state similarity metric with an improved definition of state closeness. As a result, our scheme accurately distributes and generalizes knowledge among related states. We further enhance SAK's efficiency by allowing a limited number of prototype states that have certain similarities to be activated for value approximation so that the risk of over-generalization is hindered. In addition, SAK eliminates size tuning and prototype reallocation for the prototype set, resulting in not only broadened scalability but also significant savings in the amount of necessary prototypes and computational overhead needed for RL. Our extensive experimental results show that SAK achieves more than 48% improvements over existing schemes in learning quality, and reveal that SAK is able to consistently learn good policies for RL with small overhead and short training times, even given roughly tuned scheme parameters. |