Zobrazeno 1 - 10
of 97
pro vyhledávání: '"Pirotta, Matteo"'
Autor:
Cetin, Edoardo, Tirinzoni, Andrea, Pirotta, Matteo, Lazaric, Alessandro, Ollivier, Yann, Touati, Ahmed
Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which trajectories come from heterogeneous sources, we show that existing methods st
Externí odkaz:
http://arxiv.org/abs/2403.13097
We study the autonomous exploration (AX) problem proposed by Lim & Auer (2012). In this setting, the objective is to discover a set of $\epsilon$-optimal policies reaching a set $\mathcal{S}_L^{\rightarrow}$ of incrementally $L$-controllable states.
Externí odkaz:
http://arxiv.org/abs/2302.03789
In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs. In practice, the embedding is often learned at the same time as the reward vector, thus le
Externí odkaz:
http://arxiv.org/abs/2212.09429
Autor:
Chen, Yifang, Sankararaman, Karthik, Lazaric, Alessandro, Pirotta, Matteo, Karamshuk, Dmytro, Wang, Qifan, Mandyam, Karishma, Wang, Sinong, Fang, Han
Active learning with strong and weak labelers considers a practical setting where we have access to both costly but accurate strong labelers and inaccurate but cheap predictions provided by weak labelers. We study this problem in the streaming settin
Externí odkaz:
http://arxiv.org/abs/2211.02233
We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-acti
Externí odkaz:
http://arxiv.org/abs/2210.13083
We consider Contextual Bandits with Concave Rewards (CBCR), a multi-objective bandit problem where the desired trade-off between the rewards is defined by a known concave objective function, and the reward vector depends on an observed stochastic con
Externí odkaz:
http://arxiv.org/abs/2210.09957
We study the sample complexity of learning an $\epsilon$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case
Externí odkaz:
http://arxiv.org/abs/2210.04946
We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating
Externí odkaz:
http://arxiv.org/abs/2112.06517
Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected. Inspired by this scenario, we stu
Externí odkaz:
http://arxiv.org/abs/2112.06008
This paper studies privacy-preserving exploration in Markov Decision Processes (MDPs) with linear representation. We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a.\ model-based setting) and provide an unified framework
Externí odkaz:
http://arxiv.org/abs/2112.01585