Zobrazeno 1 - 10
of 2 517
pro vyhledávání: '"Mutti, P."'
How can a scientist use a Reinforcement Learning (RL) algorithm to design experiments over a dynamical system's state space? In the case of finite and Markovian systems, an area called Active Exploration (AE) relaxes the optimization problem of exper
Externí odkaz:
http://arxiv.org/abs/2407.13364
The problem of pure exploration in Markov decision processes has been cast as maximizing the entropy over the state distribution induced by the agent's policy, an objective that has been extensively studied. However, little attention has been dedicat
Externí odkaz:
http://arxiv.org/abs/2406.12795
Building on the one-to-one relationship between generalized FGM copulas and multivariate Bernoulli distributions, we prove that the class of multivariate distributions with generalized FGM copulas is a convex polytope. Therefore, we find sharp bounds
Externí odkaz:
http://arxiv.org/abs/2406.10648
In online Inverse Reinforcement Learning (IRL), the learner can collect samples about the dynamics of the environment to improve its estimate of the reward function. Since IRL suffers from identifiability issues, many theoretical works on online IRL
Externí odkaz:
http://arxiv.org/abs/2406.03812
Recent works have studied *state entropy maximization* in reinforcement learning, in which the agent's objective is to learn a policy inducing high entropy over states visitation (Hazan et al., 2019). They typically assume full observability of the s
Externí odkaz:
http://arxiv.org/abs/2406.02295
Autor:
Mutti, Mirco, Tamar, Aviv
Meta reinforcement learning sets a distribution over a set of tasks on which the agent can train at will, then is asked to learn an optimal policy for any test task efficiently. In this paper, we consider a finite set of tasks modeled through Markov
Externí odkaz:
http://arxiv.org/abs/2406.02282
Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior. It is well-known that the IRL problem is fundamentally ill-posed, i.e., many reward functions can explain the demonstrations.
Externí odkaz:
http://arxiv.org/abs/2402.15392
The growing deployment of reinforcement learning from human feedback (RLHF) calls for a deeper theoretical investigation of its underlying models. The prevalent models of RLHF do not account for neuroscience-backed, partially-observed "internal state
Externí odkaz:
http://arxiv.org/abs/2402.03282
Posterior sampling allows exploitation of prior knowledge on the environment's transition dynamics to improve the sample efficiency of reinforcement learning. The prior is typically specified as a class of parametric distributions, the design of whic
Externí odkaz:
http://arxiv.org/abs/2310.07518
Autor:
Mutti, Alessandro, Semeraro, Patrizia
The key result of this paper is to find all the joint distributions of random vectors whose sums $S=X_1+\ldots+X_d$ are minimal in convex order in the class of symmetric Bernoulli distributions. The minimal convex sums distributions are known to be s
Externí odkaz:
http://arxiv.org/abs/2309.17346