Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning

Autor:	Dimitar Petrov Filev, Panagiotis Tsiotras, Changxi You, Jianbo Lu
Rok vydání:	2019
Předmět:	0209 industrial biotechnology business.industry Computer science General Mathematics Principle of maximum entropy media_common.quotation_subject Parameterized complexity 02 engineering and technology Computer Science Applications 03 medical and health sciences 020901 industrial engineering & automation 0302 clinical medicine Control and Systems Engineering 030220 oncology & carcinogenesis Feature (machine learning) Fuel efficiency Reinforcement learning Markov decision process Artificial intelligence business Function (engineering) Intelligent transportation system Software media_common
Zdroj:	Robotics and Autonomous Systems. 114:1-18
ISSN:	0921-8890
DOI:	10.1016/j.robot.2019.01.003
Popis:	Autonomous vehicles promise to improve traffic safety while, at the same time, increase fuel efficiency and reduce congestion. They represent the main trend in future intelligent transportation systems. This paper concentrates on the planning problem of autonomous vehicles in traffic. We model the interaction between the autonomous vehicle and the environment as a stochastic Markov decision process (MDP) and consider the driving style of an expert driver as the target to be learned. The road geometry is taken into consideration in the MDP model in order to incorporate more diverse driving styles. The desired, expert-like driving behavior of the autonomous vehicle is obtained as follows: First, we design the reward function of the corresponding MDP and determine the optimal driving strategy for the autonomous vehicle using reinforcement learning techniques. Second, we collect a number of demonstrations from an expert driver and learn the optimal driving strategy based on data using inverse reinforcement learning. The unknown reward function of the expert driver is approximated using a deep neural-network (DNN). We clarify and validate the application of the maximum entropy principle (MEP) to learn the DNN reward function, and provide the necessary derivations for using the maximum entropy principle to learn a parameterized feature (reward) function. Simulated results demonstrate the desired driving behaviors of an autonomous vehicle using both the reinforcement learning and inverse reinforcement learning techniques.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::c9bc3de7933fa6c28aabe2a20adc4e07 https://doi.org/10.1016/j.robot.2019.01.003 Zobrazit plný text záznamu Full Text from ScienceDirect