Výsledky vyhledávání - "Devraj, Adithya M."

Report

Autor: Liu, Yueyang, Devraj, Adithya M., Van Roy, Benjamin, Xu, Kuang

Assuming distributions are Gaussian often facilitates computations that are otherwise intractable. We study the performance of an agent that attains a bounded information ratio with respect to a bandit environment with a Gaussian prior distribution a

Externí odkaz: http://arxiv.org/abs/2201.01902

Zobrazit plný text záznamu

Report

A Bit Better? Quantifying Information for Bandit Learning

Autor: Devraj, Adithya M., Van Roy, Benjamin, Xu, Kuang

The information ratio offers an approach to assessing the efficacy with which an agent balances between exploration and exploitation. Originally, this was defined to be the ratio between squared expected regret and the mutual information between the

Externí odkaz: http://arxiv.org/abs/2102.09488

Zobrazit plný text záznamu

Report

Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning

Autor: Devraj, Adithya M., Meyn, Sean P.

Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-\gamma)$, where $\gamma < 1$ is t

Externí odkaz: http://arxiv.org/abs/2002.10301

Zobrazit plný text záznamu

Report

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

Autor: Chen, Shuhang, Devraj, Adithya M., Bušić, Ana, Meyn, Sean

This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpret

Externí odkaz: http://arxiv.org/abs/2002.02584

Zobrazit plný text záznamu

Report

Zap Q-Learning With Nonlinear Function Approximation

Autor: Chen, Shuhang, Devraj, Adithya M., Lu, Fan, Bušić, Ana, Meyn, Sean P.

Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This pape

Externí odkaz: http://arxiv.org/abs/1910.05405

Zobrazit plný text záznamu

Report

Stochastic Variance Reduced Primal Dual Algorithms for Empirical Composition Optimization

Autor: Devraj, Adithya M., Chen, Jianshu

We consider a generic empirical composition optimization problem, where there are empirical averages present both outside and inside nonlinear loss functions. Such a problem is of interest in various machine learning applications, and cannot be direc

Externí odkaz: http://arxiv.org/abs/1907.09150

Zobrazit plný text záznamu

Report

Zap Q-Learning for Optimal Stopping Time Problems

Autor: Chen, Shuhang, Devraj, Adithya M., Bušić, Ana, Meyn, Sean P.

The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact subset of

Externí odkaz: http://arxiv.org/abs/1904.11538

Zobrazit plný text záznamu

Report

Differential Temporal Difference Learning

Autor: Devraj, Adithya M., Kontoyiannis, Ioannis, Meyn, Sean P.

Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to the associ

Externí odkaz: http://arxiv.org/abs/1812.11137

Zobrazit plný text záznamu

Report

Optimal Matrix Momentum Stochastic Approximation and Applications to Q-learning

Autor: Devraj, Adithya M., Bušić, Ana, Meyn, Sean

Acceleration is an increasingly common theme in the stochastic optimization literature. The two most common examples are Nesterov's method, and Polyak's momentum technique. In this paper two new algorithms are introduced for root finding problems: 1)

Externí odkaz: http://arxiv.org/abs/1809.06277

Zobrazit plný text záznamu

Report

Fastest Convergence for Q-learning

Autor: Devraj, Adithya M., Meyn, Sean P.

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE ana

Externí odkaz: http://arxiv.org/abs/1707.03770

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání