Zobrazeno 1 - 10
of 24
pro vyhledávání: '"Skalse, Joar"'
In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the tra
Externí odkaz:
http://arxiv.org/abs/2406.15753
Autor:
Dalrymple, David "davidad", Skalse, Joar, Bengio, Yoshua, Russell, Stuart, Tegmark, Max, Seshia, Sanjit, Omohundro, Steve, Szegedy, Christian, Goldhaber, Ben, Ammann, Nora, Abate, Alessandro, Halpern, Joe, Barrett, Clark, Zhao, Ding, Zhi-Xuan, Tan, Wing, Jeannette, Tenenbaum, Joshua
Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper
Externí odkaz:
http://arxiv.org/abs/2405.06624
Autor:
Skalse, Joar, Abate, Alessandro
Inverse reinforcement learning (IRL) aims to infer an agent's preferences (represented as a reward function $R$) from their behaviour (represented as a policy $\pi$). To do this, we need a behavioural model of how $\pi$ relates to $R$. In the current
Externí odkaz:
http://arxiv.org/abs/2403.06854
Autor:
Skalse, Joar, Abate, Alessandro
Publikováno v:
Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, PMLR 216:1974-1984, 2023
In this paper, we study the expressivity of scalar, Markovian reward functions in Reinforcement Learning (RL), and identify several limitations to what they can express. Specifically, we look at three classes of RL tasks; multi-objective RL, risk-sen
Externí odkaz:
http://arxiv.org/abs/2401.14811
Autor:
Subramani, Rohan, Williams, Marcus, Heitmann, Max, Holm, Halfdan, Griffin, Charlie, Skalse, Joar
Most algorithms in reinforcement learning (RL) require that the objective is formalised with a Markovian reward function. However, it is well-known that certain tasks cannot be expressed by means of an objective in the Markov rewards formalism, motiv
Externí odkaz:
http://arxiv.org/abs/2310.11840
Autor:
Karwowski, Jacek, Hayman, Oliver, Bai, Xingjian, Kiendlhofer, Klaus, Griffin, Charlie, Skalse, Joar
Implementing a reward function that perfectly captures a complex task in the real world is impractical. As a result, it is often appropriate to think of the reward function as a proxy for the true objective rather than as its definition. We study thi
Externí odkaz:
http://arxiv.org/abs/2310.09144
Autor:
Skalse, Joar, Farnik, Lucy, Motwani, Sumeet Ramesh, Jenner, Erik, Gleave, Adam, Abate, Alessandro
In order to solve a task using reinforcement learning, it is necessary to first formalise the goal of that task as a reward function. However, for many real-world tasks, it is very difficult to manually specify a reward function that never incentivis
Externí odkaz:
http://arxiv.org/abs/2309.15257
Publikováno v:
IJCAI 2022; Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence. Main Track, Pages 3430-3436
In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, a
Externí odkaz:
http://arxiv.org/abs/2212.13769
Autor:
Skalse, Joar, Abate, Alessandro
Publikováno v:
Proceedings of the AAAI Conference on Artificial Intelligence, 2023
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $\pi$. To do this, we need a model of how $\pi$ relates to $R$. In the current literature, the most common models are optimality, Boltzmann rationality, a
Externí odkaz:
http://arxiv.org/abs/2212.03201
We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, $\mathcal{\tilde{R}}$, leads to poor performance according to the true reward function, $\mathcal{R}$. We say that a proxy is
Externí odkaz:
http://arxiv.org/abs/2209.13085