Zobrazeno 1 - 10
of 185
pro vyhledávání: '"Oliehoek, Frans"'
Publikováno v:
Frontiers in Artificial Intelligence and Applications, vol. 392, ECAI 2024, pp. 2919-2926
Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives. An MORL agent must make decisions based on the diverse signals provided by distinct reward functions. Training an MORL agent yields a set of soluti
Externí odkaz:
http://arxiv.org/abs/2411.04784
This paper explores the impact of variable pragmatic competence on communicative success through simulating language learning and conversing between speakers and listeners with different levels of reasoning abilities. Through studying this interactio
Externí odkaz:
http://arxiv.org/abs/2410.05851
Real-world environments require robots to continuously acquire new skills while retaining previously learned abilities, all without the need for clearly defined task boundaries. Storing all past data to prevent forgetting is impractical due to storag
Externí odkaz:
http://arxiv.org/abs/2410.02995
Publikováno v:
Reinforcement Learning Journal, vol. 1, no. 1, 2024, pp. TBD
In key real-world problems, full state information is sometimes available but only at a high cost, like activating precise yet energy-intensive sensors or consulting humans, thereby compelling the agent to operate under partial observability. For thi
Externí odkaz:
http://arxiv.org/abs/2407.18812
We consider inverse reinforcement learning problems with concave utilities. Concave Utility Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which employs a concave function of the state occupancy measure, rather than a
Externí odkaz:
http://arxiv.org/abs/2405.19024
Publikováno v:
The 33rd International Joint Conference on Artificial Intelligence, 2024
Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more compl
Externí odkaz:
http://arxiv.org/abs/2403.02227
Policy gradient methods are widely adopted reinforcement learning algorithms for tasks with continuous action spaces. These methods succeeded in many application domains, however, because of their notorious sample inefficiency their use remains limit
Externí odkaz:
http://arxiv.org/abs/2402.12034
Learning rewards from human behaviour or feedback is a promising approach to aligning AI systems with human values but fails to consistently extract correct reward functions. Interpretability tools could enable users to understand and evaluate possib
Externí odkaz:
http://arxiv.org/abs/2402.04856
Autor:
Osika, Zuzanna, Salazar, Jazmin Zatarain, Roijers, Diederik M., Oliehoek, Frans A., Murukannaiah, Pradeep K.
We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms. As MOO is applied to solve diverse problems, approaches for analyzing the trade-offs offered by MOO algor
Externí odkaz:
http://arxiv.org/abs/2311.11288
Reinforcement learning agents tend to develop habits that are effective only under specific policies. Following an initial exploration phase where agents try out different actions, they eventually converge onto a particular policy. As this occurs, th
Externí odkaz:
http://arxiv.org/abs/2306.02419