Výsledky vyhledávání - "Weisz, Gellért"

Report

Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear $q^\pi$-Realizability and Concentrability

Autor: Tkachuk, Volodymyr, Weisz, Gellért, Szepesvári, Csaba

We consider offline reinforcement learning (RL) in $H$-horizon Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where the action-value function of every policy is linear with respect to a given $d$-dimensional featu

Externí odkaz: http://arxiv.org/abs/2405.16809

Zobrazit plný text záznamu

Report

Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore

Autor: Weisz, Gellért, György, András, Szepesvári, Csaba

We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-ac

Externí odkaz: http://arxiv.org/abs/2310.07811

Zobrazit plný text záznamu

Report

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL

Autor: Liu, Qinghua, Weisz, Gellért, György, András, Jin, Chi, Szepesvári, Csaba

While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabul

Externí odkaz: http://arxiv.org/abs/2305.11032

Zobrazit plný text záznamu

Report

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

Autor: Kane, Daniel, Liu, Sihan, Lovett, Shachar, Mahajan, Gaurav, Szepesvári, Csaba, Weisz, Gellért

A fundamental question in reinforcement learning theory is: suppose the optimal value functions are linear in given features, can we learn them efficiently? This problem's counterpart in supervised learning, linear regression, can be solved both stat

Externí odkaz: http://arxiv.org/abs/2302.12940

Zobrazit plný text záznamu

Report

Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs

Autor: Weisz, Gellért, György, András, Kozuno, Tadashi, Szepesvári, Csaba

We consider approximate dynamic programming in $\gamma$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy Iteration (API),

Externí odkaz: http://arxiv.org/abs/2210.15755

Zobrazit plný text záznamu

Report

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions

Autor: Weisz, Gellért, Szepesvári, Csaba, György, András

We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either (i) t

Externí odkaz: http://arxiv.org/abs/2110.02195

Zobrazit plný text záznamu

Report

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function

Autor: Weisz, Gellért, Amortila, Philip, Janzer, Barnabás, Abbasi-Yadkori, Yasin, Jiang, Nan, Szepesvári, Csaba

We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map. The generative model provides a local access to the MDP: The planner can ask for ra

Externí odkaz: http://arxiv.org/abs/2102.02049

Zobrazit plný text záznamu

Report

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

Autor: Weisz, Gellért, Amortila, Philip, Szepesvári, Csaba

We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a fea

Externí odkaz: http://arxiv.org/abs/2010.01374

Zobrazit plný text záznamu

Report

Learning with Good Feature Representations in Bandits and in RL with a Generative Model

Autor: Lattimore, Tor, Szepesvari, Csaba, Weisz, Gellert

The construction by Du et al. (2019) implies that even if a learner is given linear features in $\mathbb R^d$ that approximate the rewards in a bandit with a uniform error of $\epsilon$, then searching for an action that is optimal up to $O(\epsilon)

Externí odkaz: http://arxiv.org/abs/1911.07676

Zobrazit plný text záznamu

Report

Exploration-Enhanced POLITEX

Autor: Abbasi-Yadkori, Yasin, Lazic, Nevena, Szepesvari, Csaba, Weisz, Gellert

We study algorithms for average-cost reinforcement learning problems with value function approximation. Our starting point is the recently proposed POLITEX algorithm, a version of policy iteration where the policy produced in each iteration is near-o

Externí odkaz: http://arxiv.org/abs/1908.10479

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání