Decoupling Patrolling Tasks for Water Quality Monitoring: A Multi-Agent Deep Reinforcement Learning Approach

Autor: Dame Seck Diop, Samuel Yanes Luis, Manuel Perales Esteve, Sergio L. Toral Marin, Daniel Gutierrez Reina
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: IEEE Access, Vol 12, Pp 75559-75576 (2024)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2024.3403790
Popis: This study proposes the use of an Autonomous Surface Vehicle (ASV) fleet with water quality sensors for efficient patrolling to monitor water resource pollution. This is formulated as a Patrolling Problem, which consists of planning and executing efficient routes to continuously monitor a given area. When patrolling Lake Ypacaraí with ASVs, the scenario transforms into a Partially Observable Markov Game (POMG) due to unknown pollution levels. Given the computational complexity, a Multi-Agent Deep Reinforcement Learning (MADRL) approach is adopted, with a common policy for homogeneous agents. A consensus algorithm assists in collision avoidance and coordination. The work introduces exploration and reinforcement phases to the patrolling problem. The Exploration Phase aims at homogeneous map coverage, while the Intensification Phase prioritizes high polluted areas. The innovative introduction of a transition variable, $\nu $ , efficiently controls the transition from exploration to intensification. Results demonstrate the superiority of the method, which outperforms a Single-Phase (trained on a single task) Deep Q-Network (DQN) by an average of 17% on the intensification task. The proposed multitask learning approach with parameter sharing, coupled with DQN training, outperforms Task-Specific DQN (two DQNs trained on separate tasks) by 6% in exploration and 13% in intensification. It also outperforms the heuristic-based Lawn Mower Path Planner (LMPP) and Random Wanderer Path Planner (RWPP) algorithms, by 35% and 20% on average respectively. Additionally, it outperforms a Particle Swarm Optimization-based Path Planner (PSOPP) by an average of 26%. The algorithm demonstrates adaptability in unforeseen scenarios, giving users flexibility in configuration.
Databáze: Directory of Open Access Journals