Výsledky vyhledávání - "Pirotta, Matteo"

Report

Simple Ingredients for Offline Reinforcement Learning

Autor: Cetin, Edoardo, Tirinzoni, Andrea, Pirotta, Matteo, Lazaric, Alessandro, Ollivier, Yann, Touati, Ahmed

Offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. Yet, leveraging a novel testbed (MOOD) in which trajectories come from heterogeneous sources, we show that existing methods st

Externí odkaz: http://arxiv.org/abs/2403.13097

Zobrazit plný text záznamu

Report

Layered State Discovery for Incremental Autonomous Exploration

Autor: Chen, Liyu, Tirinzoni, Andrea, Lazaric, Alessandro, Pirotta, Matteo

We study the autonomous exploration (AX) problem proposed by Lim & Auer (2012). In this setting, the objective is to discover a set of $\epsilon$-optimal policies reaching a set $\mathcal{S}_L^{\rightarrow}$ of incrementally $L$-controllable states.

Externí odkaz: http://arxiv.org/abs/2302.03789

Zobrazit plný text záznamu

Report

On the Complexity of Representation Learning in Contextual Linear Bandits

Autor: Tirinzoni, Andrea, Pirotta, Matteo, Lazaric, Alessandro

In contextual linear bandits, the reward function is assumed to be a linear combination of an unknown reward vector and a given embedding of context-arm pairs. In practice, the embedding is often learned at the same time as the reward vector, thus le

Externí odkaz: http://arxiv.org/abs/2212.09429

Zobrazit plný text záznamu

Report

Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler

Autor: Chen, Yifang, Sankararaman, Karthik, Lazaric, Alessandro, Pirotta, Matteo, Karamshuk, Dmytro, Wang, Qifan, Mandyam, Karishma, Wang, Sinong, Fang, Han

Active learning with strong and weak labelers considers a practical setting where we have access to both costly but accurate strong labelers and inaccurate but cheap predictions provided by weak labelers. We study this problem in the streaming settin

Externí odkaz: http://arxiv.org/abs/2211.02233

Zobrazit plný text záznamu

Report

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

Autor: Tirinzoni, Andrea, Papini, Matteo, Touati, Ahmed, Lazaric, Alessandro, Pirotta, Matteo

We study the problem of representation learning in stochastic contextual linear bandits. While the primary concern in this domain is usually to find realizable representations (i.e., those that allow predicting the reward function at any context-acti

Externí odkaz: http://arxiv.org/abs/2210.13083

Zobrazit plný text záznamu

Report

Contextual bandits with concave rewards, and an application to fair ranking

Autor: Do, Virginie, Dohmatob, Elvis, Pirotta, Matteo, Lazaric, Alessandro, Usunier, Nicolas

We consider Contextual Bandits with Concave Rewards (CBCR), a multi-objective bandit problem where the desired trade-off between the rewards is defined by a known concave objective function, and the reward vector depends on an observed stochastic con

Externí odkaz: http://arxiv.org/abs/2210.09957

Zobrazit plný text záznamu

Report

Reaching Goals is Hard: Settling the Sample Complexity of the Stochastic Shortest Path

Autor: Chen, Liyu, Tirinzoni, Andrea, Pirotta, Matteo, Lazaric, Alessandro

We study the sample complexity of learning an $\epsilon$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case

Externí odkaz: http://arxiv.org/abs/2210.04946

Zobrazit plný text záznamu

Report

Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

Autor: Garcelon, Evrard, Avadhanula, Vashist, Lazaric, Alessandro, Pirotta, Matteo

We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating

Externí odkaz: http://arxiv.org/abs/2112.06517

Zobrazit plný text záznamu

Report

Privacy Amplification via Shuffling for Linear Contextual Bandits

Autor: Garcelon, Evrard, Chaudhuri, Kamalika, Perchet, Vianney, Pirotta, Matteo

Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected. Inspired by this scenario, we stu

Externí odkaz: http://arxiv.org/abs/2112.06008

Zobrazit plný text záznamu

Report

Differentially Private Exploration in Reinforcement Learning with Linear Representation

Autor: Luyo, Paul, Garcelon, Evrard, Lazaric, Alessandro, Pirotta, Matteo

This paper studies privacy-preserving exploration in Markov Decision Processes (MDPs) with linear representation. We first consider the setting of linear-mixture MDPs (Ayoub et al., 2020) (a.k.a.\ model-based setting) and provide an unified framework

Externí odkaz: http://arxiv.org/abs/2112.01585

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání