Výsledky vyhledávání - "Jordan, Scott M."

Report

Position: Benchmarking is Limited in Reinforcement Learning Research

Autor: Jordan, Scott M., White, Adam, da Silva, Bruno Castro, White, Martha, Thomas, Philip S.

Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms. However, despite numerous cal

Externí odkaz: http://arxiv.org/abs/2406.16241

Zobrazit plný text záznamu

Report

A New View on Planning in Online Reinforcement Learning

Autor: Roice, Kevin, Panahi, Parham Mohammad, Jordan, Scott M., White, Adam, White, Martha

This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models

Externí odkaz: http://arxiv.org/abs/2406.01562

Zobrazit plný text záznamu

Report

From Past to Future: Rethinking Eligibility Traces

Autor: Gupta, Dhawal, Jordan, Scott M., Chaudhari, Shreyas, Liu, Bo, Thomas, Philip S., da Silva, Bruno Castro

In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation. First, we delve into the nuances of eligibility traces and explore instances where their updates may result in unexpected credit assignment

Externí odkaz: http://arxiv.org/abs/2312.12972

Zobrazit plný text záznamu

Report

Behavior Alignment via Reward Function Optimization

Autor: Gupta, Dhawal, Chandak, Yash, Jordan, Scott M., Thomas, Philip S., da Silva, Bruno Castro

Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadve

Externí odkaz: http://arxiv.org/abs/2310.19007

Zobrazit plný text záznamu

Report

Coagent Networks: Generalized and Scaled

Autor: Kostas, James E., Jordan, Scott M., Chandak, Yash, Theocharous, Georgios, Gupta, Dhawal, White, Martha, da Silva, Bruno Castro, Thomas, Philip S.

Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011] provide a powerful and flexible framework for deriving principled learning rules for arbitrary stochastic neural networks. The coagent framework offers an alternative to backpr

Externí odkaz: http://arxiv.org/abs/2305.09838

Zobrazit plný text záznamu

Report

Robust Markov Decision Processes without Model Estimation

Autor: Yang, Wenhao, Wang, Han, Kozuno, Tadashi, Jordan, Scott M., Zhang, Zhihua

Robust Markov Decision Processes (MDPs) are receiving much attention in learning a robust policy which is less sensitive to environment changes. There are an increasing number of works analyzing sample-efficiency of robust MDPs. However, there are tw

Externí odkaz: http://arxiv.org/abs/2302.01248

Zobrazit plný text záznamu

Report

Towards Safe Policy Improvement for Non-Stationary MDPs

Autor: Chandak, Yash, Jordan, Scott M., Theocharous, Georgios, White, Martha, Thomas, Philip S.

Many real-world sequential decision-making problems involve critical systems with financial risks and human-life risks. While several works in the past have proposed methods that are safe for deployment, they assume that the underlying problem is sta

Externí odkaz: http://arxiv.org/abs/2010.12645

Zobrazit plný text záznamu

Report

Evaluating the Performance of Reinforcement Learning Algorithms

Autor: Jordan, Scott M., Chandak, Yash, Cohen, Daniel, Zhang, Mengxue, Thomas, Philip S.

Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning. Recent reproducibility analyses have shown that reported performance results are often inconsistent and difficult to replicate. In this work, we argu

Externí odkaz: http://arxiv.org/abs/2006.16958

Zobrazit plný text záznamu

Report

Classical Policy Gradient: Preserving Bellman's Principle of Optimality

Autor: Thomas, Philip S., Jordan, Scott M., Chandak, Yash, Nota, Chris, Kostas, James

We propose a new objective function for finite-horizon episodic Markov decision processes that better captures Bellman's principle of optimality, and provide an expression for the gradient of the objective.
Comment: 1 page, 0 figures

Externí odkaz: http://arxiv.org/abs/1906.03063

Zobrazit plný text záznamu

Report

Distributed Evaluations: Ending Neural Point Metrics

Autor: Cohen, Daniel, Jordan, Scott M., Croft, W. Bruce

With the rise of neural models across the field of information retrieval, numerous publications have incrementally pushed the envelope of performance for a multitude of IR tasks. However, these networks often sample data in random order, are initiali

Externí odkaz: http://arxiv.org/abs/1806.03790

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání