Zobrazeno 1 - 10
of 68
pro vyhledávání: '"Hanna, Josiah P."'
In reinforcement learning, offline value function learning is the procedure of using an offline dataset to estimate the expected discounted return from each state when taking actions according to a fixed target policy. The stability of this procedure
Externí odkaz:
http://arxiv.org/abs/2410.01643
We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a new method that enables reinforcement learning (RL) to perform long-horizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves
Externí odkaz:
http://arxiv.org/abs/2406.17168
In this paper, we study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. The tasks share a common structure and the algorithm exploits the shared structure to minimize the cumu
Externí odkaz:
http://arxiv.org/abs/2406.05064
In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected cumulative reward it will ob
Externí odkaz:
http://arxiv.org/abs/2406.02165
General Value Functions (GVFs) (Sutton et al., 2011) represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique reward. Existing methods relying on fixed behavior policies or
Externí odkaz:
http://arxiv.org/abs/2405.07838
Learning a good history representation is one of the core challenges of reinforcement learning (RL) in partially observable environments. Recent works have shown the advantages of various auxiliary tasks for facilitating representation learning. Howe
Externí odkaz:
http://arxiv.org/abs/2402.07102
Autor:
Corrado, Nicholas E., Hanna, Josiah P.
On-policy reinforcement learning (RL) algorithms perform policy updates using i.i.d. trajectories collected by the current policy. However, after observing only a finite number of trajectories, on-policy sampling may produce data that fails to match
Externí odkaz:
http://arxiv.org/abs/2311.08290
We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types and the reward is a bilinear function of the known fe
Externí odkaz:
http://arxiv.org/abs/2311.00327
Autor:
Pavse, Brahma S., Hanna, Josiah P.
In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically successful
Externí odkaz:
http://arxiv.org/abs/2310.18409
In offline reinforcement learning (RL), an RL agent learns to solve a task using only a fixed dataset of previously collected data. While offline RL has been successful in learning real-world robot control policies, it typically requires large amount
Externí odkaz:
http://arxiv.org/abs/2310.18247