Zobrazeno 1 - 10
of 145
pro vyhledávání: '"Spaan, Matthijs"'
Intelligent agents designed for interactive environments face significant challenges in text-based games, a domain that demands complex reasoning and adaptability. While agents based on large language models (LLMs) using self-reflection have shown pr
Externí odkaz:
http://arxiv.org/abs/2411.02223
In multi-task reinforcement learning, agents train on a fixed set of tasks and have to generalise to new ones. Recent work has shown that increased exploration improves this generalisation, but it remains unclear why exactly that is. In this paper, w
Externí odkaz:
http://arxiv.org/abs/2410.03565
Autor:
Galesloot, Maris F. L., Suilen, Marnix, Simão, Thiago D., Carr, Steven, Spaan, Matthijs T. J., Topcu, Ufuk, Jansen, Nils
Robust POMDPs extend classical POMDPs to handle model uncertainty. Specifically, robust POMDPs exhibit so-called uncertainty sets on the transition and observation models, effectively defining ranges of probabilities. Policies for robust POMDPs must
Externí odkaz:
http://arxiv.org/abs/2408.08770
One of the remaining challenges in reinforcement learning is to develop agents that can generalise to novel scenarios they might encounter once deployed. This challenge is often framed in a multi-task setting where agents train on a fixed set of task
Externí odkaz:
http://arxiv.org/abs/2406.08069
Autor:
Oren, Yaniv, Zanger, Moritz A., van der Vaart, Pascal R., Spaan, Matthijs T. J., Bohmer, Wendelin
Many modern reinforcement learning algorithms build on the actor-critic (AC) framework: iterative improvement of a policy (the actor) using policy improvement operators and iterative approximation of the policy's value (the critic). In contrast, the
Externí odkaz:
http://arxiv.org/abs/2406.01423
Language models (LMs) have achieved impressive accuracy across a variety of tasks but remain vulnerable to high-confidence misclassifications, also referred to as unknown unknowns (UUs). These UUs cluster into blind spots in the feature space, leadin
Externí odkaz:
http://arxiv.org/abs/2403.17860
Policy gradient methods are widely adopted reinforcement learning algorithms for tasks with continuous action spaces. These methods succeeded in many application domains, however, because of their notorious sample inefficiency their use remains limit
Externí odkaz:
http://arxiv.org/abs/2402.12034
Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknow
Externí odkaz:
http://arxiv.org/abs/2307.14316
In contrast to classical reinforcement learning, distributional reinforcement learning algorithms aim to learn the distribution of returns rather than their expected value. Since the nature of the return distribution is generally unknown a priori or
Externí odkaz:
http://arxiv.org/abs/2306.07124
In reinforcement learning (RL), key components of many algorithms are the exploration strategy and replay buffer. These strategies regulate what environment data is collected and trained on and have been extensively studied in the RL literature. In t
Externí odkaz:
http://arxiv.org/abs/2306.05727