An Online Reinforcement Learning Approach for User-Optimal Parking Searching Strategy Exploiting Unique Problem Property and Network Topology

Autor:	Jun Xiao, Yingyan Lou
Rok vydání:	2022
Předmět:	Mathematical optimization Parking guidance and information Computer science Mechanical Engineering Approximation algorithm Markov process Flow network Network topology Computer Science Applications Dynamic programming symbols.namesake Automotive Engineering symbols Reinforcement learning Markov decision process
Zdroj:	IEEE Transactions on Intelligent Transportation Systems. 23:8157-8169
ISSN:	1558-0016 1524-9050
Popis:	This paper investigates the idea of introducing learning algorithms into parking guidance and information systems that employ a central server, in order to provide estimated optimal parking searching strategies to travelers. The parking searching process on a network with uncertain parking availability can naturally be modeled as a Markov Decision Process (MDP). Such an MDP with full information can easily be solved by dynamic programming approaches. However, the probabilities of finding parking are difficult to define and calculate. Learning algorithms are suitable for addressing this issue. We propose an algorithm based on Q-learning, where a unique property of the parking searching MDP and the topology of the underlying transportation network are incorporated and utilized to improve its performance. This modification allows us to reduce the size of the learning problem dramatically, and thus the amount of data required to learn the optimal strategy. Numerical experiments conducted on a toy network with fixed parking probabilities show that the proposed learning algorithm outperforms the original Q-learning algorithm and three greedy heuristics in terms of the quality of the approximated optimal solution as well as the amount of training data required. Our numerical experiments on a real network with time-dependent underlying probabilities show that effective searching strategies can be achieved by the proposed algorithm, even though the learning algorithms treat the parking probabilities as constant during each exploration-exploitation cycle. The results again demonstrate that the proposed modified Q-learning algorithm significantly outperforms the original Q-learning with the same amount of training data. The results also provide insights into how the length and the split of the exploration-exploitation cycle affect the effectiveness of the proposed learning algorithm.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::2e385fe875309b1f8832676a352267e8 https://doi.org/10.1109/tits.2021.3076408 Zobrazit plný text záznamu