Výsledky vyhledávání - "Bhandari, Jalaj"

Report

Pearl: A Production-ready Reinforcement Learning Agent

Autor: Zhu, Zheqing, Braz, Rodrigo de Salvo, Bhandari, Jalaj, Jiang, Daniel, Wan, Yi, Efroni, Yonathan, Wang, Liyuan, Xu, Ruiyang, Guo, Hongbo, Nikulkov, Alex, Korenkevych, Dmytro, Dogan, Urun, Cheng, Frank, Wu, Zheng, Xu, Wanqiao

Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling parti

Externí odkaz: http://arxiv.org/abs/2312.03814

Zobrazit plný text záznamu

Report

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

Autor: Xu, Ruiyang, Bhandari, Jalaj, Korenkevych, Dmytro, Liu, Fan, He, Yuchen, Nikulkov, Alex, Zhu, Zheqing

Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on use

Externí odkaz: http://arxiv.org/abs/2305.13747

Zobrazit plný text záznamu

Dissertation/ Thesis

Optimization Foundations of Reinforcement Learning

Autor: Bhandari, Jalaj

Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and artificial intelligence communities in the past decade. With tremendous success already demonstrated for Game AI, RL offers great potential for applicat

Zobrazit plný text záznamu

Report

On Linear Convergence of Policy Gradient Methods for Finite MDPs

Autor: Bhandari, Jalaj, Russo, Daniel

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent

Externí odkaz: http://arxiv.org/abs/2007.11120

Zobrazit plný text záznamu

Report

Global Optimality Guarantees For Policy Gradient Methods

Autor: Bhandari, Jalaj, Russo, Daniel

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by standard dynamic programming t

Externí odkaz: http://arxiv.org/abs/1906.01786

Zobrazit plný text záznamu

Report

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

Autor: Bhandari, Jalaj, Russo, Daniel, Singal, Raghav

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its t

Externí odkaz: http://arxiv.org/abs/1806.02450

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

On the tightness of an LP relaxation for rational optimization and its applications

Autor: Avadhanula, Vashist, Bhandari, Jalaj, Goyal, Vineet, Zeevi, Assaf

Publikováno v: In Operations Research Letters September 2016 44(5):612-617

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání