Výsledky vyhledávání - "Tiapkin, Daniil"

Report

A New Bound on the Cumulant Generating Function of Dirichlet Processes

Autor: Perrault, Pierre, Belomestny, Denis, Ménard, Pierre, Moulines, Éric, Naumov, Alexey, Tiapkin, Daniil, Valko, Michal

In this paper, we introduce a novel approach for bounding the cumulant generating function (CGF) of a Dirichlet process (DP) $X \sim \text{DP}(\alpha \nu_0)$, using superadditivity. In particular, our key technical contribution is the demonstration o

Externí odkaz: http://arxiv.org/abs/2409.18621

Zobrazit plný text záznamu

Report

Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization

Autor: Tiapkin, Daniil, Chzhen, Evgenii, Stoltz, Gilles

In this paper, we consider the problem of learning in adversarial Markov decision processes [MDPs] with an oblivious adversary in a full-information setting. The agent interacts with an environment during $T$ episodes, each of which consists of $H$ s

Externí odkaz: http://arxiv.org/abs/2407.05704

Zobrazit plný text záznamu

Report

Improving GFlowNets with Monte Carlo Tree Search

Autor: Morozov, Nikita, Tiapkin, Daniil, Samsonov, Sergey, Naumov, Alexey, Vetrov, Dmitry

Generative Flow Networks (GFlowNets) treat sampling from distributions over compositional discrete spaces as a sequential decision-making problem, training a stochastic policy to construct objects step by step. Recent studies have revealed strong con

Externí odkaz: http://arxiv.org/abs/2406.13655

Zobrazit plný text záznamu

Report

Incentivized Learning in Principal-Agent Bandit Games

Autor: Scheid, Antoine, Tiapkin, Daniil, Boursier, Etienne, Capitaine, Aymeric, Mhamdi, El Mahdi El, Moulines, Eric, Jordan, Michael I., Durmus, Alain

This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent. The principal and the agent have misaligned objectives and the choice of action is only left to the agent. Howev

Externí odkaz: http://arxiv.org/abs/2403.03811

Zobrazit plný text záznamu

Report

Model-free Posterior Sampling via Learning Rate Randomization

Autor: Tiapkin, Daniil, Belomestny, Denis, Calandriello, Daniele, Moulines, Eric, Munos, Remi, Naumov, Alexey, Perrault, Pierre, Valko, Michal, Menard, Pierre

In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior

Externí odkaz: http://arxiv.org/abs/2310.18186

Zobrazit plný text záznamu

Report

Demonstration-Regularized RL

Autor: Tiapkin, Daniil, Belomestny, Denis, Calandriello, Daniele, Moulines, Eric, Naumov, Alexey, Perrault, Pierre, Valko, Michal, Menard, Pierre

Incorporating expert demonstrations has empirically helped to improve the sample efficiency of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra information reduces RL's sample complexity. In particular, we st

Externí odkaz: http://arxiv.org/abs/2310.17303

Zobrazit plný text záznamu

Report

Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability

Autor: Samsonov, Sergey, Tiapkin, Daniil, Naumov, Alexey, Moulines, Eric

In this paper we consider the problem of obtaining sharp bounds for the performance of temporal difference (TD) methods with linear function approximation for policy evaluation in discounted Markov decision processes. We show that a simple algorithm

Externí odkaz: http://arxiv.org/abs/2310.14286

Zobrazit plný text záznamu

Report

Generative Flow Networks as Entropy-Regularized RL

Autor: Tiapkin, Daniil, Morozov, Nikita, Naumov, Alexey, Vetrov, Dmitry

The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature

Externí odkaz: http://arxiv.org/abs/2310.12934

Zobrazit plný text záznamu

Report

Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

Autor: Belomestny, Denis, Menard, Pierre, Naumov, Alexey, Tiapkin, Daniil, Valko, Michal

In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obta

Externí odkaz: http://arxiv.org/abs/2304.03056

Zobrazit plný text záznamu

Report

Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold

Autor: Schechtman, Sholom, Tiapkin, Daniil, Muehlebach, Michael, Moulines, Eric

We consider the problem of minimizing a non-convex function over a smooth manifold $\mathcal{M}$. We propose a novel algorithm, the Orthogonal Directions Constrained Gradient Method (ODCGM) which only requires computing a projection onto a vector spa

Externí odkaz: http://arxiv.org/abs/2303.09261

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání