Zobrazeno 1 - 10
of 72
pro vyhledávání: '"FUJIMOTO, SCOTT"'
Ensuring long-term fairness is crucial when developing automated decision making systems, specifically in dynamic and sequential environments. By maximizing their reward without consideration of fairness, AI agents can introduce disparities in their
Externí odkaz:
http://arxiv.org/abs/2412.17123
Autor:
Zhan, Wenhao, Fujimoto, Scott, Zhu, Zheqing, Lee, Jason D., Jiang, Daniel R., Efroni, Yonathan
We study the problem of learning an approximate equilibrium in the offline multi-agent reinforcement learning (MARL) setting. We introduce a structural assumption -- the interaction rank -- and establish that functions with low interaction rank are s
Externí odkaz:
http://arxiv.org/abs/2410.01101
Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert, using only observational data and without the direct guidance of demonstrated actions. In this paper, we re-examine optimal transpo
Externí odkaz:
http://arxiv.org/abs/2310.01632
Autor:
Fujimoto, Scott, Chang, Wei-Di, Smith, Edward J., Gu, Shixiang Shane, Precup, Doina, Meger, David
In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked for environments with low-level states, such as physical control problems. This paper introduces SALE, a nove
Externí odkaz:
http://arxiv.org/abs/2306.02451
We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only. Our approach decouples reward modelling from policy learning, unlike state-of-the-art adversarial methods which require updating the reward model du
Externí odkaz:
http://arxiv.org/abs/2205.09251
In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the d
Externí odkaz:
http://arxiv.org/abs/2201.12417
Autor:
Fujimoto, Scott, Gu, Shixiang Shane
Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of constraining or regularizing the policy w
Externí odkaz:
http://arxiv.org/abs/2106.06860
Marginalized importance sampling (MIS), which measures the density ratio between the state-action occupancy of a target policy and that of a sampling distribution, is a promising approach for off-policy evaluation. However, current state-of-the-art M
Externí odkaz:
http://arxiv.org/abs/2106.06854
Prioritized Experience Replay (PER) is a deep reinforcement learning technique in which agents learn from transitions sampled with non-uniform probability proportionate to their temporal-difference error. We show that any loss function evaluated with
Externí odkaz:
http://arxiv.org/abs/2007.06049