Výsledky vyhledávání - "Proutière, A."

Report

Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation

Autor: Stojanovic, Stefan, Jedra, Yassir, Proutiere, Alexandre

We consider the problem of learning an $\varepsilon$-optimal policy in controlled dynamical systems with low-rank latent structure. For this problem, we present LoRa-PI (Low-Rank Policy Iteration), a model-free learning algorithm alternating between

Externí odkaz: http://arxiv.org/abs/2410.23434

Zobrazit plný text záznamu

Report

Conformal Predictions under Markovian Data

Autor: Zheng, Frédéric, Proutiere, Alexandre

We study the split Conformal Prediction method when applied to Markovian data. We quantify the gap in terms of coverage induced by the correlations in the data (compared to exchangeable data). This gap strongly depends on the mixing properties of the

Externí odkaz: http://arxiv.org/abs/2407.15277

Zobrazit plný text záznamu

Report

Model-Free Active Exploration in Reinforcement Learning

Autor: Russo, Alessio, Proutiere, Alexandre

Publikováno v: Advances in Neural Information Processing Systems 36 (NeurIPS 2023)

We study the problem of exploration in Reinforcement Learning and present a novel model-free solution. We adopt an information-theoretical viewpoint and start from the instance-specific lower bound of the number of samples that have to be collected t

Externí odkaz: http://arxiv.org/abs/2407.00801

Zobrazit plný text záznamu

Report

Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery

Autor: Jedra, Yassir, Réveillard, William, Stojanovic, Stefan, Proutiere, Alexandre

We study contextual bandits with low-rank structure where, in each round, if the (context, arm) pair $(i,j)\in [m]\times [n]$ is selected, the learner observes a noisy sample of the $(i,j)$-th entry of an unknown low-rank reward matrix. Successive co

Externí odkaz: http://arxiv.org/abs/2402.15739

Zobrazit plný text záznamu

Report

Best Arm Identification with Fixed Budget: A Large Deviation Perspective

Autor: Wang, Po-An, Tzeng, Ruo-Chun, Proutiere, Alexandre

We consider the problem of identifying the best arm in stochastic Multi-Armed Bandits (MABs) using a fixed sampling budget. Characterizing the minimal instance-specific error probability for this problem constitutes one of the important remaining ope

Externí odkaz: http://arxiv.org/abs/2312.12137

Zobrazit plný text záznamu

Report

Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning

Autor: Stojanovic, Stefan, Jedra, Yassir, Proutiere, Alexandre

We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure. In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for exam

Externí odkaz: http://arxiv.org/abs/2310.06793

Zobrazit plný text záznamu

Report

Sub-linear Regret in Adaptive Model Predictive Control

Autor: Tranos, Damianos, Proutiere, Alexandre

We consider the problem of adaptive Model Predictive Control (MPC) for uncertain linear-systems with additive disturbances and with state and input constraints. We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online algorithm

Externí odkaz: http://arxiv.org/abs/2310.04842

Zobrazit plný text záznamu

Report

On Universally Optimal Algorithms for A/B Testing

Autor: Wang, Po-An, Ariu, Kaito, Proutiere, Alexandre

We study the problem of best-arm identification with fixed budget in stochastic multi-armed bandits with Bernoulli rewards. For the problem with two arms, also known as the A/B testing problem, we prove that there is no algorithm that (i) performs as

Externí odkaz: http://arxiv.org/abs/2308.12000

Zobrazit plný text záznamu

Report

Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model

Autor: Ariu, Kaito, Proutiere, Alexandre, Yun, Se-Young

We consider the problem of recovering hidden communities in the Labeled Stochastic Block Model (LSBM) with a finite number of clusters, where cluster sizes grow linearly with the total number $n$ of items. In the LSBM, a label is (independently) obse

Externí odkaz: http://arxiv.org/abs/2306.12968

Zobrazit plný text záznamu

Report

Conformal Off-Policy Evaluation in Markov Decision Processes

Autor: Foffano, Daniele, Russo, Alessio, Proutiere, Alexandre

Publikováno v: 2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023

Reinforcement Learning aims at identifying and evaluating efficient control policies from data. In many real-world applications, the learner is not allowed to experiment and cannot gather data in an online manner (this is the case when experimenting

Externí odkaz: http://arxiv.org/abs/2304.02574

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání