Dirichlet-Multinomial Counterfactual Rewards for Heterogeneous Multiagent Systems
Autor: | Nicholas Zerbel, Gaurav Dixit, Kagan Tumer |
---|---|
Rok vydání: | 2019 |
Předmět: |
Counterfactual thinking
business.industry Computer science Process (engineering) Multi-agent system Exploration problem Machine learning computer.software_genre Dirichlet distribution symbols.namesake symbols Domain knowledge Artificial intelligence Baseline (configuration management) business computer Selection (genetic algorithm) |
Zdroj: | MRS |
DOI: | 10.1109/mrs.2019.8901077 |
Popis: | Multi-robot teams have been shown to be effective in accomplishing complex tasks which require tight coordination among team members. In homogeneous systems, recent work has demonstrated that “stepping stone” rewards are an effective way to provide agents with feedback on potentially valuable actions even when the agent-to-agent coupling requirements of an objective are not satisfied. In this work, we propose a new mechanism for inferring hypothetical partners in tightly-coupled, heterogeneous systems called Dirichlet-Multinomial Counterfactual Selection (DMCS). Using DMCS, we show that agents can learn to infer appropriate counterfactual partners to receive more informative stepping stone rewards by testing in a modified multi-rover exploration problem. We also show that DMCS outperforms a random partner selection baseline by over 40%, and we demonstrate how domain knowledge can be used to induce a prior to guide the agent learning process. Finally, we show that DMCS maintains superior performance for up to 15 distinct rover types compared to the performance of the baseline which degrades rapidly. |
Databáze: | OpenAIRE |
Externí odkaz: |