Zobrazeno 1 - 10
of 308
pro vyhledávání: '"Schapire, Robert E."'
A lexicographic maximum of a set $X \subseteq \mathbb{R}^n$ is a vector in $X$ whose smallest component is as large as possible, and subject to that requirement, whose second smallest component is as large as possible, and so on for the third smalles
Externí odkaz:
http://arxiv.org/abs/2405.01387
We study interactive learning in a setting where the agent has to generate a response (e.g., an action or trajectory) given a context and an instruction. In contrast, to typical approaches that train the system using reward or expert supervision on r
Externí odkaz:
http://arxiv.org/abs/2404.09123
We study reinforcement learning (RL) in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space, as is the case, for example, when a robot is tasked to go to a sp
Externí odkaz:
http://arxiv.org/abs/2205.14237
Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity. In this work, we aim to develop a theory for understanding such minimizers at infinity. We study astral space, a comp
Externí odkaz:
http://arxiv.org/abs/2205.03260
Autor:
Simchowitz, Max, Tosh, Christopher, Krishnamurthy, Akshay, Hsu, Daniel, Lykouris, Thodoris, Dudík, Miroslav, Schapire, Robert E.
Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain
Externí odkaz:
http://arxiv.org/abs/2107.01509
Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias. This bias is typically towards a c
Externí odkaz:
http://arxiv.org/abs/2006.11226
A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded. We present a new technique that has the empirical and computational advantages of realizability-based app
Externí odkaz:
http://arxiv.org/abs/1803.01088
Autor:
Dann, Christoph, Jiang, Nan, Krishnamurthy, Akshay, Agarwal, Alekh, Langford, John, Schapire, Robert E.
We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These method
Externí odkaz:
http://arxiv.org/abs/1803.00606
We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its
Externí odkaz:
http://arxiv.org/abs/1612.06246