Výsledky vyhledávání

Report

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Autor: Faccio, Francesco, Ramesh, Aditya, Herrmann, Vincent, Harb, Jean, Schmidhuber, Jürgen

Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value function

Externí odkaz: http://arxiv.org/abs/2207.01566

Zobrazit plný text záznamu

Report

Policy Evaluation Networks

Autor: Harb, Jean, Schaul, Tom, Precup, Doina, Bacon, Pierre-Luc

Many reinforcement learning algorithms use value functions to guide the search for better policies. These methods estimate the value of a single policy while generalizing across many states. The core idea of this paper is to flip this convention and

Externí odkaz: http://arxiv.org/abs/2002.11833

Zobrazit plný text záznamu

Report

The Barbados 2018 List of Open Issues in Continual Learning

Autor: Schaul, Tom, van Hasselt, Hado, Modayil, Joseph, White, Martha, White, Adam, Bacon, Pierre-Luc, Harb, Jean, Mourad, Shibl, Bellemare, Marc, Precup, Doina

We want to make progress toward artificial general intelligence, namely general-purpose agents that autonomously learn how to competently act in complex environments. The purpose of this report is to sketch a research outline, share some of the most

Externí odkaz: http://arxiv.org/abs/1811.07004

Zobrazit plný text záznamu

Report

Learnings Options End-to-End for Continuous Action Tasks

Autor: Klissarov, Martin, Bacon, Pierre-Luc, Harb, Jean, Precup, Doina

We present new results on learning temporally extended actions for continuoustasks, using the options framework (Suttonet al.[1999b], Precup [2000]). In orderto achieve this goal we work with the option-critic architecture (Baconet al.[2017])using a

Externí odkaz: http://arxiv.org/abs/1712.00004

Zobrazit plný text záznamu

Report

When Waiting is not an Option : Learning Options with a Deliberation Cost

Autor: Harb, Jean, Bacon, Pierre-Luc, Klissarov, Martin, Precup, Doina

Recent work has shown that temporally extended actions (options) can be learned fully end-to-end as opposed to being specified in advance. While the problem of "how" to learn options is increasingly well understood, the question of "what" good option

Externí odkaz: http://arxiv.org/abs/1709.04571

Zobrazit plný text záznamu

Report

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Autor: Lowe, Ryan, Wu, Yi, Tamar, Aviv, Harb, Jean, Abbeel, Pieter, Mordatch, Igor

We explore deep reinforcement learning methods for multi-agent domains. We begin by analyzing the difficulty of traditional algorithms in the multi-agent case: Q-learning is challenged by an inherent non-stationarity of the environment, while policy

Externí odkaz: http://arxiv.org/abs/1706.02275

Zobrazit plný text záznamu

Report

Investigating Recurrence and Eligibility Traces in Deep Q-Networks

Autor: Harb, Jean, Precup, Doina

Eligibility traces in reinforcement learning are used as a bias-variance trade-off and can often speed up training time by propagating knowledge back over time-steps in a single update. We investigate the use of eligibility traces in combination with

Externí odkaz: http://arxiv.org/abs/1704.05495

Zobrazit plný text záznamu

Report

The Option-Critic Architecture

Autor: Bacon, Pierre-Luc, Harb, Jean, Precup, Doina

Temporal abstraction is key to scaling up learning and planning in reinforcement learning. While planning with temporally extended actions is well understood, creating such abstractions autonomously from data has remained challenging. We tackle this

Externí odkaz: http://arxiv.org/abs/1609.05140

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání