Zobrazeno 1 - 10
of 18
pro vyhledávání: '"Weisz, Gellért"'
We consider offline reinforcement learning (RL) in $H$-horizon Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where the action-value function of every policy is linear with respect to a given $d$-dimensional featu
Externí odkaz:
http://arxiv.org/abs/2405.16809
We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-ac
Externí odkaz:
http://arxiv.org/abs/2310.07811
While policy optimization algorithms have played an important role in recent empirical success of Reinforcement Learning (RL), the existing theoretical understanding of policy optimization remains rather limited -- they are either restricted to tabul
Externí odkaz:
http://arxiv.org/abs/2305.11032
Autor:
Kane, Daniel, Liu, Sihan, Lovett, Shachar, Mahajan, Gaurav, Szepesvári, Csaba, Weisz, Gellért
A fundamental question in reinforcement learning theory is: suppose the optimal value functions are linear in given features, can we learn them efficiently? This problem's counterpart in supervised learning, linear regression, can be solved both stat
Externí odkaz:
http://arxiv.org/abs/2302.12940
We consider approximate dynamic programming in $\gamma$-discounted Markov decision processes and apply it to approximate planning with linear value-function approximation. Our first contribution is a new variant of Approximate Policy Iteration (API),
Externí odkaz:
http://arxiv.org/abs/2210.15755
We consider the minimax query complexity of online planning with a generative model in fixed-horizon Markov decision processes (MDPs) with linear function approximation. Following recent works, we consider broad classes of problems where either (i) t
Externí odkaz:
http://arxiv.org/abs/2110.02195
Autor:
Weisz, Gellért, Amortila, Philip, Janzer, Barnabás, Abbasi-Yadkori, Yasin, Jiang, Nan, Szepesvári, Csaba
We consider local planning in fixed-horizon MDPs with a generative model under the assumption that the optimal value function lies close to the span of a feature map. The generative model provides a local access to the MDP: The planner can ask for ra
Externí odkaz:
http://arxiv.org/abs/2102.02049
We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes (MDPs) with linear function approximation and a generative model under the assumption that the optimal action-value function lies in the span of a fea
Externí odkaz:
http://arxiv.org/abs/2010.01374
The construction by Du et al. (2019) implies that even if a learner is given linear features in $\mathbb R^d$ that approximate the rewards in a bandit with a uniform error of $\epsilon$, then searching for an action that is optimal up to $O(\epsilon)
Externí odkaz:
http://arxiv.org/abs/1911.07676
We study algorithms for average-cost reinforcement learning problems with value function approximation. Our starting point is the recently proposed POLITEX algorithm, a version of policy iteration where the policy produced in each iteration is near-o
Externí odkaz:
http://arxiv.org/abs/1908.10479