Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Durugkar, Ishan"'
Current approaches to learning cooperative multi-agent behaviors assume relatively restrictive settings. In standard fully cooperative multi-agent reinforcement learning, the learning algorithm controls $\textit{all}$ agents in the scenario, while in
Externí odkaz:
http://arxiv.org/abs/2404.10740
Goal-Conditioned Reinforcement Learning (RL) problems often have access to sparse rewards where the agent receives a reward signal only when it has achieved the goal, making policy optimization a difficult problem. Several works augment this sparse r
Externí odkaz:
http://arxiv.org/abs/2310.06794
Given a dataset of expert agent interactions with an environment of interest, a viable method to extract an effective agent policy is to estimate the maximum likelihood policy indicated by this data. This approach is commonly referred to as behaviora
Externí odkaz:
http://arxiv.org/abs/2211.04005
Current approaches to multi-agent cooperation rely heavily on centralized mechanisms or explicit communication protocols to ensure convergence. This paper studies the problem of distributed multi-agent learning without resorting to centralized compon
Externí odkaz:
http://arxiv.org/abs/2206.00233
This paper deals with the problem of learning a skill-conditioned policy that acts meaningfully in the absence of a reward signal. Mutual information based objectives have shown some success in learning skills that reach a diverse set of states in th
Externí odkaz:
http://arxiv.org/abs/2110.15331
Learning with an objective to minimize the mismatch with a reference distribution has been shown to be useful for generative modeling and imitation learning. In this paper, we investigate whether one such objective, the Wasserstein-1 distance between
Externí odkaz:
http://arxiv.org/abs/2105.13345
Temporal difference (TD) learning is one of the main foundations of modern reinforcement learning. This paper studies the use of TD(0), a canonical TD algorithm, to estimate the value function of a given policy from a batch of data. In this batch set
Externí odkaz:
http://arxiv.org/abs/2008.06738
Autor:
Desai, Siddharth, Durugkar, Ishan, Karnan, Haresh, Warnell, Garrett, Hanna, Josiah, Stone, Peter
Publikováno v:
Neural Information Processing Systems (NeurIPS 2020)
We examine the problem of transferring a policy learned in a source environment to a target environment with different dynamics, particularly in the case where it is critical to reduce the amount of interaction with the target environment during lear
Externí odkaz:
http://arxiv.org/abs/2008.01594
Policy gradient algorithms typically combine discounted future rewards with an estimated value function, to compute the direction and magnitude of parameter updates. However, for most Reinforcement Learning tasks, humans can provide additional insigh
Externí odkaz:
http://arxiv.org/abs/1904.03295
Autor:
Das, Rajarshi, Dhuliawala, Shehzaad, Zaheer, Manzil, Vilnis, Luke, Durugkar, Ishan, Krishnamurthy, Akshay, Smola, Alex, McCallum, Andrew
Knowledge bases (KB), both automatically and manually constructed, are often incomplete --- many valid facts can be inferred from the KB by synthesizing existing information. A popular approach to KB completion is to infer new relations by combinator
Externí odkaz:
http://arxiv.org/abs/1711.05851