Výsledky vyhledávání - "Hanna, Josiah P."

Report

Stable Offline Value Function Learning with Bisimulation-based Representations

Autor: Pavse, Brahma S., Chen, Yudong, Xie, Qiaomin, Hanna, Josiah P.

In reinforcement learning, offline value function learning is the procedure of using an offline dataset to estimate the expected discounted return from each state when taking actions according to a fixed target policy. The stability of this procedure

Externí odkaz: http://arxiv.org/abs/2410.01643

Zobrazit plný text záznamu

Report

Reinforcement Learning via Auxiliary Task Distillation

Autor: Harish, Abhinav Narayan, Heck, Larry, Hanna, Josiah P., Kira, Zsolt, Szot, Andrew

We present Reinforcement Learning via Auxiliary Task Distillation (AuxDistill), a new method that enables reinforcement learning (RL) to perform long-horizon robot control problems by distilling behaviors from auxiliary RL tasks. AuxDistill achieves

Externí odkaz: http://arxiv.org/abs/2406.17168

Zobrazit plný text záznamu

Report

Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning

Autor: Mukherjee, Subhojyoti, Hanna, Josiah P., Xie, Qiaomin, Nowak, Robert

In this paper, we study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. The tasks share a common structure and the algorithm exploits the shared structure to minimize the cumu

Externí odkaz: http://arxiv.org/abs/2406.05064

Zobrazit plný text záznamu

Report

SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

Autor: Mukherjee, Subhojyoti, Hanna, Josiah P., Nowak, Robert

In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected cumulative reward it will ob

Externí odkaz: http://arxiv.org/abs/2406.02165

Zobrazit plný text záznamu

Report

Adaptive Exploration for Data-Efficient General Value Function Evaluations

Autor: Jain, Arushi, Hanna, Josiah P., Precup, Doina

General Value Functions (GVFs) (Sutton et al., 2011) represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique reward. Existing methods relying on fixed behavior policies or

Externí odkaz: http://arxiv.org/abs/2405.07838

Zobrazit plný text záznamu

Report

Future Prediction Can be a Strong Evidence of Good History Representation in Partially Observable Environments

Autor: Kwon, Jeongyeol, Yang, Liu, Nowak, Robert, Hanna, Josiah

Learning a good history representation is one of the core challenges of reinforcement learning (RL) in partially observable environments. Recent works have shown the advantages of various auxiliary tasks for facilitating representation learning. Howe

Externí odkaz: http://arxiv.org/abs/2402.07102

Zobrazit plný text záznamu

Report

On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling

Autor: Corrado, Nicholas E., Hanna, Josiah P.

On-policy reinforcement learning (RL) algorithms perform policy updates using i.i.d. trajectories collected by the current policy. However, after observing only a finite number of trajectories, on-policy sampling may produce data that fails to match

Externí odkaz: http://arxiv.org/abs/2311.08290

Zobrazit plný text záznamu

Report

Multi-task Representation Learning for Pure Exploration in Bilinear Bandits

Autor: Mukherjee, Subhojyoti, Xie, Qiaomin, Hanna, Josiah P., Nowak, Robert

We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes the form of a pair of arms from two different entity types and the reward is a bilinear function of the known fe

Externí odkaz: http://arxiv.org/abs/2311.00327

Zobrazit plný text záznamu

Report

Autor: Pavse, Brahma S., Hanna, Josiah P.

In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically successful

Externí odkaz: http://arxiv.org/abs/2310.18409

Zobrazit plný text záznamu

Report

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

Autor: Corrado, Nicholas E., Qu, Yuxiao, Balis, John U., Labiosa, Adam, Hanna, Josiah P.

In offline reinforcement learning (RL), an RL agent learns to solve a task using only a fixed dataset of previously collected data. While offline RL has been successful in learning real-world robot control policies, it typically requires large amount

Externí odkaz: http://arxiv.org/abs/2310.18247

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání