Výsledky vyhledávání - "Zhang, Shangtong"

Report

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning

Autor: Chen, Claire, Liu, Shuze, Zhang, Shangtong

In reinforcement learning, classic on-policy evaluation methods often suffer from high variance and require massive online data to attain the desired accuracy. Previous studies attempt to reduce evaluation variance by searching for or designing prope

Externí odkaz: http://arxiv.org/abs/2410.05655

Zobrazit plný text záznamu

Report

Doubly Optimal Policy Evaluation for Reinforcement Learning

Autor: Liu, Shuze, Chen, Claire, Zhang, Shangtong

Policy evaluation estimates the performance of a policy by (1) collecting data from the environment and (2) processing raw data into a meaningful estimate. Due to the sequential nature of reinforcement learning, any improper data-collecting policy or

Externí odkaz: http://arxiv.org/abs/2410.02226

Zobrazit plný text záznamu

Report

Almost Sure Convergence of Average Reward Temporal Difference Learning

Autor: Blaser, Ethan, Zhang, Shangtong

Tabular average reward Temporal Difference (TD) learning is perhaps the simplest and the most fundamental policy evaluation algorithm in average reward reinforcement learning. After at least 25 years since its discovery, we are finally able to provid

Externí odkaz: http://arxiv.org/abs/2409.19546

Zobrazit plný text záznamu

Report

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

Autor: Wang, Jiuqi, Zhang, Shangtong

Temporal difference (TD) learning with linear function approximation, abbreviated as linear TD, is a classic and powerful prediction algorithm in reinforcement learning. While it is well understood that linear TD converges almost surely to a unique p

Externí odkaz: http://arxiv.org/abs/2409.12135

Zobrazit plný text záznamu

Report

Efficient Multi-Policy Evaluation for Reinforcement Learning

Autor: Liu, Shuze, Chen, Yuxin, Zhang, Shangtong

To unbiasedly evaluate multiple target policies, the dominant approach among RL practitioners is to run and evaluate each target policy separately. However, this evaluation method is far from efficient because samples are not shared across policies,

Externí odkaz: http://arxiv.org/abs/2408.08706

Zobrazit plný text záznamu

Report

Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning

Autor: Wang, Jiuqi, Blaser, Ethan, Daneshmand, Hadi, Zhang, Shangtong

In-context learning refers to the learning ability of a model during inference time without adapting its parameters. The input (i.e., prompt) to the model (e.g., transformers) consists of both a context (i.e., instance-label pairs) and a query instan

Externí odkaz: http://arxiv.org/abs/2405.13861

Zobrazit plný text záznamu

Report

The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise

Autor: Liu, Shuze, Chen, Shuhang, Zhang, Shangtong

Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic

Externí odkaz: http://arxiv.org/abs/2401.07844

Zobrazit plný text záznamu

Report

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level ex

Externí odkaz: http://arxiv.org/abs/2308.03526

Zobrazit plný text záznamu

Report

Direct Gradient Temporal Difference Learning

Autor: Qian, Xiaochi, Zhang, Shangtong

Off-policy learning enables a reinforcement learning (RL) agent to reason counterfactually about policies that are not executed and is one of the most important ideas in RL. It, however, can lead to instability when combined with function approximati

Externí odkaz: http://arxiv.org/abs/2308.01170

Zobrazit plný text záznamu

Report

Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

Autor: Liu, Shuze, Zhang, Shangtong

Most reinforcement learning practitioners evaluate their policies with online Monte Carlo estimators for either hyperparameter tuning or testing different algorithmic design choices, where the policy is repeatedly executed in the environment to get t

Externí odkaz: http://arxiv.org/abs/2301.13734

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání