Learning controlled and targeted communication with the centralized critic for the multi-agent system.

Autor:	Sun, Qingshuang, Yao, Yuan, Yi, Peng, Hu, YuJiao, Yang, Zhao, Yang, Gang, Zhou, Xingshe
Předmět:	REINFORCEMENT learning MULTIAGENT systems TELECOMMUNICATION systems CRITICS
Zdroj:	Applied Intelligence; Jun2023, Vol. 53 Issue 12, p14819-14837, 19p
Abstrakt:	Multi-agent deep reinforcement learning (MDRL) has attracted attention for solving complex tasks. Two main challenges of MDRL are non-stationarity and partial observability from the perspective of agents, impacting the performance of agents' learning cooperative policies. In this study, Controlled and Targeted Communication with the Centralized Critic (COTAC) is proposed, thereby constructing the paradigm of centralized learning and decentralized execution with partial communication. It is capable of decoupling how the MAS obtains environmental information during training and execution. Specifically, COTAC can make the environment faced by agents to be stationarity in the training phase and learn partial communication to overcome the limitation of partial observability in the execution phase. Based on this, decentralized actors learn controlled and targeted communication and policies optimized by centralized critics during training. As a result, agents comprehensively learn when to communicate during the sending and how to target information aggregation during the receiving. Apart from that, COTAC is evaluated on two multi-agent scenarios with continuous space. Experimental results demonstrated that partial agents with important information choose to send messages and targeted aggregate received information by identifying the relevant important information, which can still have better cooperation performance while reducing the communication traffic of the system. [ABSTRACT FROM AUTHOR]
Databáze:	Complementary Index
Externí odkaz:	Zobrazit plný text záznamu Full text from SpringerLink