Zobrazeno 1 - 10
of 318
pro vyhledávání: '"Metelli, P"'
Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems. However, the complexity of exploring vast policy spaces can lead to significant inefficiencies. Re
Externí odkaz:
http://arxiv.org/abs/2411.09900
Achieving the no-regret property for Reinforcement Learning (RL) problems in continuous state and action-space environments is one of the major open problems in the field. Existing solutions either work under very specific assumptions or achieve boun
Externí odkaz:
http://arxiv.org/abs/2410.24071
Policy evaluation via Monte Carlo (MC) simulation is at the core of many MC Reinforcement Learning (RL) algorithms (e.g., policy gradient methods). In this context, the designer of the learning system specifies an interaction budget that the agent us
Externí odkaz:
http://arxiv.org/abs/2410.13463
Dealing with Partially Observable Markov Decision Processes is notably a challenging task. We face an average-reward infinite-horizon POMDP setting with an unknown transition model, where we assume the knowledge of the observation model. Under this a
Externí odkaz:
http://arxiv.org/abs/2410.01331
Our goal is to extract useful knowledge from demonstrations of behavior in sequential decision-making problems. Although it is well-known that humans commonly engage in risk-sensitive behaviors in the presence of stochasticity, most Inverse Reinforce
Externí odkaz:
http://arxiv.org/abs/2409.17355
Autor:
Genalti, Gianmarco, Mussi, Marco, Gatti, Nicola, Restelli, Marcello, Castiglioni, Matteo, Metelli, Alberto Maria
Rested and Restless Bandits are two well-known bandit settings that are useful to model real-world sequential decision-making problems in which the expected reward of an arm evolves over time due to the actions we perform or due to the nature. In thi
Externí odkaz:
http://arxiv.org/abs/2409.05980
$\textit{Restless Bandits}$ describe sequential decision-making problems in which the rewards evolve with time independently from the actions taken by the policy-maker. It has been shown that classical Bandit algorithms fail when the underlying envir
Externí odkaz:
http://arxiv.org/abs/2409.05181
The increase of renewable energy generation towards the zero-emission target is making the problem of controlling power grids more and more challenging. The recent series of competitions Learning To Run a Power Network (L2RPN) have encouraged the use
Externí odkaz:
http://arxiv.org/abs/2409.04467
Constrained Reinforcement Learning (CRL) tackles sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints, which are often formulated as expected costs.
Externí odkaz:
http://arxiv.org/abs/2407.10775
We consider Kernelized Bandits (KBs) to optimize a function $f : \mathcal{X} \rightarrow [0,1]$ belonging to the Reproducing Kernel Hilbert Space (RKHS) $\mathcal{H}_k$. Mainstream works on kernelized bandits focus on a subgaussian noise model in whi
Externí odkaz:
http://arxiv.org/abs/2407.06321