Zobrazeno 1 - 10
of 45
pro vyhledávání: '"Ho, Chin Pang"'
We develop a generic policy gradient method with the global optimality guarantee for robust Markov Decision Processes (MDPs). While policy gradient methods are widely used for solving dynamic decision problems due to their scalable and efficient natu
Externí odkaz:
http://arxiv.org/abs/2410.22114
In this paper, we consider contextual stochastic optimization using Nadaraya-Watson kernel regression, which is one of the most common approaches in nonparametric regression. Recent studies have explored the asymptotic convergence behavior of using N
Externí odkaz:
http://arxiv.org/abs/2407.10764
Safe corridor-based Trajectory Optimization (TO) presents an appealing approach for collision-free path planning of autonomous robots, offering global optimality through its convex formulation. The safe corridor is constructed based on the perceived
Externí odkaz:
http://arxiv.org/abs/2308.16381
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally rob
Externí odkaz:
http://arxiv.org/abs/2301.01045
Publikováno v:
International Conference on Machine Learning, 2023
Robust Markov decision processes (RMDPs) provide a promising framework for computing reliable policies in the face of model errors. Many successful reinforcement learning algorithms build on variations of policy-gradient methods, but adapting these m
Externí odkaz:
http://arxiv.org/abs/2212.10439
Publikováno v:
Advances in Neural Information Processing Systems (Neurips), 2022
In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty. In contrast to classical MDPs, which only account for stochasticity by modeling the dynamic
Externí odkaz:
http://arxiv.org/abs/2205.14202
We consider generic stochastic optimization problems in the presence of side information which enables a more insightful decision. The side information constitutes observable exogenous covariates that alter the conditional probability distribution of
Externí odkaz:
http://arxiv.org/abs/2110.04855
Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition
Externí odkaz:
http://arxiv.org/abs/2006.09484
Inspired by multigrid methods for linear systems of equations, multilevel optimization methods have been proposed to solve structured optimization problems. Multilevel methods make more assumptions regarding the structure of the optimization model, a
Externí odkaz:
http://arxiv.org/abs/1911.11366
We address the problem of computing reliable policies in reinforcement learning problems with limited data. In particular, we compute policies that achieve good returns with high confidence when deployed. This objective, known as the \emph{percentile
Externí odkaz:
http://arxiv.org/abs/1910.10786