Zobrazeno 1 - 10
of 36
pro vyhledávání: '"Ramponi, Giorgia"'
In this paper, we establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(\epsilon^{-3})$, advancing beyond the existing local convergence results. Previous works provide local convergence
Externí odkaz:
http://arxiv.org/abs/2410.08868
Applying reinforcement learning (RL) to real-world problems is often made challenging by the inability to interact with the environment and the difficulty of designing reward functions. Offline RL addresses the first challenge by considering access t
Externí odkaz:
http://arxiv.org/abs/2406.18450
In various applications, the optimal policy in a strategic decision-making problem depends both on the environmental configuration and exogenous events. For these settings, we introduce Bilevel Optimization with Contextual Markov Decision Processes (
Externí odkaz:
http://arxiv.org/abs/2406.01575
Constrained Markov decision processes (CMDPs) are a common way to model safety constraints in reinforcement learning. State-of-the-art methods for efficiently solving CMDPs are based on primal-dual algorithms. For these algorithms, all currently know
Externí odkaz:
http://arxiv.org/abs/2402.15776
Posterior sampling allows exploitation of prior knowledge on the environment's transition dynamics to improve the sample efficiency of reinforcement learning. The prior is typically specified as a class of parametric distributions, the design of whic
Externí odkaz:
http://arxiv.org/abs/2310.07518
Autor:
Ramponi, Giorgia, Kolev, Pavel, Pietquin, Olivier, He, Niao, Laurière, Mathieu, Geist, Matthieu
We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs), where the goal is to imitate the behavior of a population of agents following a Nash equilibrium policy according to some unknown payoff function. IL in MFGs
Externí odkaz:
http://arxiv.org/abs/2306.14799
Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective. In many real-world instances, the agents may not only want to optimize their objectives, but a
Externí odkaz:
http://arxiv.org/abs/2306.07749
Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where constraint functions model the safety objectives. Lagrangian-based dual or primal-dual algorithms provide efficient methods
Externí odkaz:
http://arxiv.org/abs/2306.07001
Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this context, Trust Region Policy Optimization methods represent a popular approach to stabilize the
Externí odkaz:
http://arxiv.org/abs/2210.11137
Autor:
Sanyal, Amartya, Ramponi, Giorgia
Online learning, in the mistake bound model, is one of the most fundamental concepts in learning theory. Differential privacy, instead, is the most widely used statistical concept of privacy in the machine learning community. It is thus clear that de
Externí odkaz:
http://arxiv.org/abs/2210.04817