Multi-Reward Architecture based Reinforcement Learning for Highway Driving Policies
Autor: | Yuesheng He, Wei Yuan, Ming Yang, Bing Wang, Chunxiang Wang |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
business.industry media_common.quotation_subject 05 social sciences Process (computing) 010501 environmental sciences Collision 01 natural sciences 0502 economics and business Reinforcement learning Domain knowledge Artificial intelligence 050207 economics Function (engineering) business Representation (mathematics) 0105 earth and related environmental sciences media_common |
Zdroj: | ITSC |
DOI: | 10.1109/itsc.2019.8917304 |
Popis: | A safe and efficient driving policy is essential for the future autonomous highway driving. However, driving policies are hard for modeling because of the diversity of scenes and uncertainties of the interaction with surrounding vehicles. The state-of-the-art deep reinforcement learning method is unable to learn good domain knowledge for highway driving policies using single reward architecture. This paper proposes a Multi-Reward Architecture (MRA) based reinforcement learning for highway driving policies. A single reward function is decomposed to multi-reward functions for better representation of multi-dimensional driving policies. Besides the big penalty for collision, the overall reward is decomposed to three dimensional rewards: the reward for speed, the reward for overtake, and the reward for lane-change. Then, each reward trains a branch of Q-network for corresponding domain knowledge. The Q-network is divided into two parts: low-level network is shared by three branches of high-level networks, which approximate the corresponding Q-value for the different reward functions respectively. The agent car chooses the action based on the sum of Q vectors from three branches. Experiments are conducted in a simulation platform, which performs the highway driving process and the agent car is able to provide the commonly used sensor data: the image and the point cloud. Experiment results show that the proposed method performs better than the DQN method on single reward architecture with three evaluations: higher speed, lower frequency of lane-change, more quantity of overtaking, which is more efficient and safer for the future autonomous highway driving. |
Databáze: | OpenAIRE |
Externí odkaz: |