Popis: |
In the the specific class of black-box optimization algorithms to find the optimal probabilistic distribution of some expected utility in reinforcement learning, higher dimensional decision variables cause the increase of cost and the slowing down of the learning speed. We clarified that the variance of the sampling probability distribution affects both for the cost and the learning speed. Especially, there exists the trade-0ff between the cost and the learning speed. In this paper, we propose two trick to improve both of the learning speed and the cost. First trick is to employ the small variance sampling distribution for improving the cost; it causes slower convergence as a side effect. As the second trick, we employed the dimensionality reduction of the decision variable for improving the learning speed. We evaluated the effects of these tricks with 2D-arm reaching task. |