Optimizing Traffic at Intersections With Deep Reinforcement Learning.

Autor:	Boyko, Nataliya, Mokryk, Yaroslav, Hwang, Yuh-Shyan
Předmět:	DEEP reinforcement learning MACHINE learning TRAFFIC signs & signals TRAFFIC engineering SIGNALIZED intersections REINFORCEMENT learning
Zdroj:	Journal of Engineering (2314-4912); 11/16/2024, Vol. 2024, p1-28, 28p
Abstrakt:	Background: The purpose of this work is to develop a method of traffic optimization at intersections using deep reinforcement learning (DRL) by creating models for controlling autonomous vehicles and traffic lights in a simulated environment. The practical value of the research is to compare the performance of existing algorithms at both traffic light control and autonomous vehicle control tasks, compare their results in different scenarios, and create a framework for testing more such methods in the future. Methods: The goal of the research is achieved by using DQN, SAC, A2C, and PPO algorithms for the task of driving vehicles across the intersection and for the task of controlling the traffic lights in a simulated environment. The comparison of the selected algorithms is done using metrics such as average reward, average waiting time (AWT), average queue length (AQL), and training time. For the task of controlling vehicles, the reward function is modeled by taking into consideration of the distance to the goal, i.e., the part of the intersection to be reached by the car, and the agent is penalized for violating traffic rules. The agents that were trained to control the phases of traffic lights have their rewards modeled based on minimizing the AWT and AQL metrics. Additionally, the traffic light control agents are trained with both discrete and continuous action spaces. Results: Most DRL algorithms suffer from the problem of catastrophic forgetting or catastrophic interference, especially DQN. This problem can be remedied by using HER or other replay buffers. The PPO algorithm, especially with a low value of the entropy coefficient hyperparameter, produced the most stable results during training, although it required more iterations to achieve these results. It was found that the dependence of the reward function on the AWT and AQL metrics has a significant impact on the results of the algorithm training. Conclusions: This work compared methods for controlling traffic lights and autonomous vehicles in a custom simulated environment at a signalized intersection. A novel environment was developed, compatible with existing frameworks such as Gym, for the task of training and validating reinforcement learning algorithms. [ABSTRACT FROM AUTHOR]
Databáze:	Complementary Index
Externí odkaz:	Zobrazit plný text záznamu Plný text