Zobrazeno 1 - 10
of 262
pro vyhledávání: '"Rakhlin, Alexander"'
We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression, all characterized by unbounded losses in the setup where no assumptions are made on the magnitude of desi
Externí odkaz:
http://arxiv.org/abs/2410.21621
We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax bounds, we show that the eluder dimension $d_\text{elu}$$-$a compl
Externí odkaz:
http://arxiv.org/abs/2410.12713
In this paper, we develop a unified framework for lower bound methods in statistical estimation and interactive decision making. Classical lower bound techniques -- such as Fano's inequality, Le Cam's method, and Assouad's lemma -- have been central
Externí odkaz:
http://arxiv.org/abs/2410.05117
The ability to efficiently explore high-dimensional state spaces is essential for the practical success of deep Reinforcement Learning (RL). This paper introduces a new exploration technique called Random Latent Exploration (RLE), that combines the s
Externí odkaz:
http://arxiv.org/abs/2407.13755
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs. To sidestep known impossibilit
Externí odkaz:
http://arxiv.org/abs/2406.07920
Autor:
Xie, Tengyang, Foster, Dylan J., Krishnamurthy, Akshay, Rosset, Corby, Awadallah, Ahmed, Rakhlin, Alexander
Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the model to p
Externí odkaz:
http://arxiv.org/abs/2405.21046
Simulators are a pervasive tool in reinforcement learning, but most existing algorithms cannot efficiently exploit simulator access -- particularly in high-dimensional domains that require general function approximation. We explore the power of simul
Externí odkaz:
http://arxiv.org/abs/2404.15417
$ $The classical theory of statistical estimation aims to estimate a parameter of interest under data generated from a fixed design ("offline estimation"), while the contemporary theory of online learning provides algorithms for estimation under adap
Externí odkaz:
http://arxiv.org/abs/2404.10122
We revisit the problem of offline reinforcement learning with value function realizability but without Bellman completeness. Previous work by Xie and Jiang (2021) and Foster et al. (2022) left open the question whether a bounded concentrability coeff
Externí odkaz:
http://arxiv.org/abs/2403.17091
In order to circumvent statistical and computational hardness results in sequential decision-making, recent work has considered smoothed online learning, where the distribution of data at each time is assumed to have bounded likeliehood ratio with re
Externí odkaz:
http://arxiv.org/abs/2402.14987