Interactive POMDPs with finite-state models of other agents
Autor: | Alessandro Panella, Piotr J. Gmytrasiewicz |
---|---|
Rok vydání: | 2017 |
Předmět: |
Stochastic process
business.industry Computer science Multi-agent system Autonomous agent Posterior probability Probabilistic logic Partially observable Markov decision process Observable 02 engineering and technology Indirect Inference Bayesian inference ComputingMethodologies_ARTIFICIALINTELLIGENCE 01 natural sciences Computer Science::Multiagent Systems 010104 statistics & probability Artificial Intelligence Prior probability 0202 electrical engineering electronic engineering information engineering Probability distribution 020201 artificial intelligence & image processing Artificial intelligence 0101 mathematics business |
Zdroj: | Autonomous Agents and Multi-Agent Systems. 31:861-904 |
ISSN: | 1573-7454 1387-2532 |
Popis: | We consider an autonomous agent facing a stochastic, partially observable, multiagent environment. In order to compute an optimal plan, the agent must accurately predict the actions of the other agents, since they influence the state of the environment and ultimately the agent's utility. To do so, we propose a special case of interactive partially observable Markov decision process, in which the agent does not explicitly model the other agents' beliefs and preferences, and instead represents them as stochastic processes implemented by probabilistic deterministic finite state controllers (PDFCs). The agent maintains a probability distribution over the PDFC models of the other agents, and updates this belief using Bayesian inference. Since the number of nodes of these PDFCs is unknown and unbounded, the agent places a Bayesian nonparametric prior distribution over the infinitely dimensional set of PDFCs. This allows the size of the learned models to adapt to the complexity of the observed behavior. Deriving the posterior distribution is in this case too complex to be amenable to analytical computation; therefore, we provide a Markov chain Monte Carlo algorithm that approximates the posterior beliefs over the other agents' PDFCs, given a sequence of (possibly imperfect) observations about their behavior. Experimental results show that the learned models converge behaviorally to the true ones. We consider two settings, one in which the agent first learns, then interacts with other agents, and one in which learning and planning are interleaved. We show that the agent's performance increases as a result of learning in both situations. Moreover, we analyze the dynamics that ensue when two agents are simultaneously learning about each other while interacting, showing in an example environment that coordination emerges naturally from our approach. Furthermore, we demonstrate how an agent can exploit the learned models to perform indirect inference over the state of the environment via the modeled agent's actions. |
Databáze: | OpenAIRE |
Externí odkaz: |