Výsledky vyhledávání - "Rakhlin, Alexander"

Report

Refined Risk Bounds for Unbounded Losses via Transductive Priors

Autor: Qian, Jian, Rakhlin, Alexander, Zhivotovskiy, Nikita

We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression, all characterized by unbounded losses in the setup where no assumptions are made on the magnitude of desi

Externí odkaz: http://arxiv.org/abs/2410.21621

Zobrazit plný text záznamu

Report

How Does Variance Shape the Regret in Contextual Bandits?

Autor: Jia, Zeyu, Qian, Jian, Rakhlin, Alexander, Wei, Chen-Yu

We consider realizable contextual bandits with general function approximation, investigating how small reward variance can lead to better-than-minimax regret bounds. Unlike in minimax bounds, we show that the eluder dimension $d_\text{elu}$$-$a compl

Externí odkaz: http://arxiv.org/abs/2410.12713

Zobrazit plný text záznamu

Report

Assouad, Fano, and Le Cam with Interaction: A Unifying Lower Bound Framework and Characterization for Bandit Learnability

Autor: Chen, Fan, Foster, Dylan J., Han, Yanjun, Qian, Jian, Rakhlin, Alexander, Xu, Yunbei

In this paper, we develop a unified framework for lower bound methods in statistical estimation and interactive decision making. Classical lower bound techniques -- such as Fano's inequality, Le Cam's method, and Assouad's lemma -- have been central

Externí odkaz: http://arxiv.org/abs/2410.05117

Zobrazit plný text záznamu

Report

Random Latent Exploration for Deep Reinforcement Learning

Autor: Mahankali, Srinath, Hong, Zhang-Wei, Sekhari, Ayush, Rakhlin, Alexander, Agrawal, Pulkit

The ability to efficiently explore high-dimensional state spaces is essential for the practical success of deep Reinforcement Learning (RL). This paper introduces a new exploration technique called Random Latent Exploration (RLE), that combines the s

Externí odkaz: http://arxiv.org/abs/2407.13755

Zobrazit plný text záznamu

Report

Near-Optimal Learning and Planning in Separated Latent MDPs

Autor: Chen, Fan, Daskalakis, Constantinos, Golowich, Noah, Rakhlin, Alexander

We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs). In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs. To sidestep known impossibilit

Externí odkaz: http://arxiv.org/abs/2406.07920

Zobrazit plný text záznamu

Report

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Autor: Xie, Tengyang, Foster, Dylan J., Krishnamurthy, Akshay, Rosset, Corby, Awadallah, Ahmed, Rakhlin, Alexander

Reinforcement learning from human feedback (RLHF) has emerged as a central tool for language model alignment. We consider online exploration in RLHF, which exploits interactive access to human or AI feedback by deliberately encouraging the model to p

Externí odkaz: http://arxiv.org/abs/2405.21046

Zobrazit plný text záznamu

Report

The Power of Resets in Online Reinforcement Learning

Autor: Mhammedi, Zakaria, Foster, Dylan J., Rakhlin, Alexander

Simulators are a pervasive tool in reinforcement learning, but most existing algorithms cannot efficiently exploit simulator access -- particularly in high-dimensional domains that require general function approximation. We explore the power of simul

Externí odkaz: http://arxiv.org/abs/2404.15417

Zobrazit plný text záznamu

Report

Online Estimation via Offline Estimation: An Information-Theoretic Framework

Autor: Foster, Dylan J., Han, Yanjun, Qian, Jian, Rakhlin, Alexander

$ $The classical theory of statistical estimation aims to estimate a parameter of interest under data generated from a fixed design ("offline estimation"), while the contemporary theory of online learning provides algorithms for estimation under adap

Externí odkaz: http://arxiv.org/abs/2404.10122

Zobrazit plný text záznamu

Report

Offline Reinforcement Learning: Role of State Aggregation and Trajectory Data

Autor: Jia, Zeyu, Rakhlin, Alexander, Sekhari, Ayush, Wei, Chen-Yu

We revisit the problem of offline reinforcement learning with value function realizability but without Bellman completeness. Previous work by Xie and Jiang (2021) and Foster et al. (2022) left open the question whether a bounded concentrability coeff

Externí odkaz: http://arxiv.org/abs/2403.17091

Zobrazit plný text záznamu

Report

On the Performance of Empirical Risk Minimization with Smoothed Data

Autor: Block, Adam, Rakhlin, Alexander, Shetty, Abhishek

In order to circumvent statistical and computational hardness results in sequential decision-making, recent work has considered smoothed online learning, where the distribution of data at each time is assumed to have bounded likeliehood ratio with re

Externí odkaz: http://arxiv.org/abs/2402.14987

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání