Zobrazeno 1 - 10
of 253
pro vyhledávání: '"Prashanth, L. A."'
We consider the problem of estimating the asymptotic variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean. We design a novel recursive estimator that requires $O(1)$ computation at each
Externí odkaz:
http://arxiv.org/abs/2409.05733
Motivated by risk-sensitive reinforcement learning scenarios, we consider the problem of policy evaluation for variance in a discounted reward Markov decision process (MDP). For this problem, a temporal difference (TD) type learning algorithm with li
Externí odkaz:
http://arxiv.org/abs/2406.07892
We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as well as conc
Externí odkaz:
http://arxiv.org/abs/2405.20933
We consider the problems of estimation and optimization of utility-based shortfall risk (UBSR), which is a popular risk measure in finance. In the context of UBSR estimation, we derive a non-asymptotic bound on the mean-squared error of the classical
Externí odkaz:
http://arxiv.org/abs/2310.18743
We tackle the problem of estimating risk measures of the infinite-horizon discounted cost within a Markov cost process. The risk measures we study include variance, Value-at-Risk (VaR), and Conditional Value-at-Risk (CVaR). First, we show that estima
Externí odkaz:
http://arxiv.org/abs/2310.11389
We consider the problem of control in the setting of reinforcement learning (RL), where model information is not available. Policy gradient algorithms are a popular solution approach for this problem and are usually shown to converge to a stationary
Externí odkaz:
http://arxiv.org/abs/2304.10951
We present in this paper a family of generalized simultaneous perturbation-based gradient search (GSPGS) estimators that use noisy function measurements. The number of function measurements required by each estimator is guided by the desired level of
Externí odkaz:
http://arxiv.org/abs/2212.10477
Publikováno v:
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, 2023
We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does no
Externí odkaz:
http://arxiv.org/abs/2210.05918
In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples, and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation
Externí odkaz:
http://arxiv.org/abs/2208.00290
In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial role, and
Externí odkaz:
http://arxiv.org/abs/2205.05843