UAV Control for Wireless Service Provisioning in Critical Demand Areas: A Deep Reinforcement Learning Approach

Autor:	Kim Khoa Nguyen, Tai Manh Ho, Mohamed Cheriet
Rok vydání:	2021
Předmět:	Trust region Computer Networks and Communications business.industry Computer science Distributed computing Aerospace Engineering ComputerApplications_COMPUTERSINOTHERSYSTEMS Energy consumption Backhaul (telecommunications) Base station Automotive Engineering Wireless Reinforcement learning Electrical and Electronic Engineering business Communication channel Efficient energy use
Zdroj:	IEEE Transactions on Vehicular Technology. 70:7138-7152
ISSN:	1939-9359 0018-9545
DOI:	10.1109/tvt.2021.3088129
Popis:	In this paper, we investigate the problem of wireless service provisioning through a rotary-wing UAV which can serve as an aerial base station (BS) to communicate with multiple ground terminals (GTs) in a boost demand area. Our objective is to optimize the UAV control for maximizing the UAV.s energy efficiency, in which both aerodynamic energy and communication energy are considered while ensuring the communication requirements for each GT and backhaul link between the UAV and the terrestrial BS. The mobility of the UAV and GTs lead to time-varying channel conditions that make the environment dynamic. We formulate a nonconvex optimization for controlling the UAV considering the practical angle-dependent Rician fading channels between the UAV and GTs, and between the UAV and the terrestrial BS. Traditional optimization approaches are not able to handle the dynamic environment and high complexity of the problem in real-time. We propose to use a deep reinforcement learning-based approach namely Deep Deterministic Policy Gradient (DDPG) to solve the formulated nonconvex problem of UAV control with continuous action space that takes into account the real-time of the environment including time-varying UAV-ground channel conditions, available onboard energy of the UAV, and the communication requirement of the GTs. However, the DDPG method may not achieve good performance in an unstable environment and will face a large number of hyperparameters. We extend our approach to use the Trust Region Policy Optimization (TRPO) method that can improve the performance of the UAV compared to the DDPG method in such a dynamic environment.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::f03a6a6db27ec192b57f29b8aeff5c5d https://doi.org/10.1109/tvt.2021.3088129 Zobrazit plný text záznamu