Zobrazeno 1 - 10
of 30
pro vyhledávání: '"Devraj, Adithya M."'
Assuming distributions are Gaussian often facilitates computations that are otherwise intractable. We study the performance of an agent that attains a bounded information ratio with respect to a bandit environment with a Gaussian prior distribution a
Externí odkaz:
http://arxiv.org/abs/2201.01902
The information ratio offers an approach to assessing the efficacy with which an agent balances between exploration and exploitation. Originally, this was defined to be the ratio between squared expected regret and the mutual information between the
Externí odkaz:
http://arxiv.org/abs/2102.09488
Autor:
Devraj, Adithya M., Meyn, Sean P.
Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-\gamma)$, where $\gamma < 1$ is t
Externí odkaz:
http://arxiv.org/abs/2002.10301
This paper concerns error bounds for recursive equations subject to Markovian disturbances. Motivating examples abound within the fields of Markov chain Monte Carlo (MCMC) and Reinforcement Learning (RL), and many of these algorithms can be interpret
Externí odkaz:
http://arxiv.org/abs/2002.02584
Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This pape
Externí odkaz:
http://arxiv.org/abs/1910.05405
Autor:
Devraj, Adithya M., Chen, Jianshu
We consider a generic empirical composition optimization problem, where there are empirical averages present both outside and inside nonlinear loss functions. Such a problem is of interest in various machine learning applications, and cannot be direc
Externí odkaz:
http://arxiv.org/abs/1907.09150
The objective in this paper is to obtain fast converging reinforcement learning algorithms to approximate solutions to the problem of discounted cost optimal stopping in an irreducible, uniformly ergodic Markov chain, evolving on a compact subset of
Externí odkaz:
http://arxiv.org/abs/1904.11538
Value functions derived from Markov decision processes arise as a central component of algorithms as well as performance metrics in many statistics and engineering applications of machine learning techniques. Computation of the solution to the associ
Externí odkaz:
http://arxiv.org/abs/1812.11137
Acceleration is an increasingly common theme in the stochastic optimization literature. The two most common examples are Nesterov's method, and Polyak's momentum technique. In this paper two new algorithms are introduced for root finding problems: 1)
Externí odkaz:
http://arxiv.org/abs/1809.06277
Autor:
Devraj, Adithya M., Meyn, Sean P.
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE ana
Externí odkaz:
http://arxiv.org/abs/1707.03770