MBB: Model-Based Baseline for Global Guidance of Model-Free Reinforcement Learning via Lower-Dimensional Solutions
Autor: | Lyu, Xubo, Li, Site, Siriya, Seth, Pu, Ye, Chen, Mo |
---|---|
Rok vydání: | 2020 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | One spectrum on which robotic control paradigms lie is the degree in which a model of the environment is involved, from methods that are completely model-free such as model-free RL, to methods that require a known model such as optimal control, with other methods such as model-based RL somewhere in the middle. On one end of the spectrum, model-free RL can learn control policies for high-dimensional (hi-dim), complex robotic tasks through trial-and-error without knowledge of a model of the environment, but tends to require a large amount of data. On the other end, "classical methods" such as optimal control generate solutions without collecting data, but assume that an accurate model of the system and environment is known and are mostly limited to problems with low-dimensional (lo-dim) state spaces. In this paper, we bring the two ends of the spectrum together. Although models of hi-dim systems and environments may not exist, lo-dim approximations of these systems and environments are widely available, especially in robotics. Therefore, we propose to solve hi-dim, complex robotic tasks in two stages. First, assuming a coarse model of the hi-dim system, we compute a lo-dim value function for the lo-dim version of the problem using classical methods (eg. value iteration and optimal control). Then, the lo-dim value function is used as a baseline function to warm-start the model-free RL process that learns hi-dim policies. The lo-dim value function provides global guidance for model-free RL, alleviating the data inefficiency of model-free RL. We demonstrate our approach on two robot learning tasks with hi-dim state spaces and observe significant improvement in policy performance and learning efficiency. We also give an empirical analysis of our method with a third task. Comment: Submitted to the 2022 IEEE International Conference on Robotics and Automation (ICRA 2022) |
Databáze: | arXiv |
Externí odkaz: |