Zobrazeno 1 - 10
of 185
pro vyhledávání: '"Paik, Myunghee Cho"'
Autor:
Lee, Kyungbok, Paik, Myunghee Cho
We introduce a novel doubly-robust (DR) off-policy evaluation (OPE) estimator for Markov decision processes, DRUnknown, designed for situations where both the logging policy and the value function are unknown. The proposed estimator initially estimat
Externí odkaz:
http://arxiv.org/abs/2404.01830
Generating samples given a specific label requires estimating conditional distributions. We derive a tractable upper bound of the Wasserstein distance between conditional distributions to lay the theoretical groundwork to learn conditional distributi
Externí odkaz:
http://arxiv.org/abs/2308.10145
Long-term care service for old people is in great demand in most of the aging societies. The number of nursing homes residents is increasing while the number of care providers is limited. Due to the care worker shortage, care to vulnerable older resi
Externí odkaz:
http://arxiv.org/abs/2303.07053
We propose a novel contextual bandit algorithm for generalized linear rewards with an $\tilde{O}(\sqrt{\kappa^{-1} \phi T})$ regret over $T$ rounds where $\phi$ is the minimum eigenvalue of the covariance of contexts and $\kappa$ is a lower bound of
Externí odkaz:
http://arxiv.org/abs/2209.06983
We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$ regret bound, where $d$ is the dimension of contexts and $T$ isthe time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded thro
Externí odkaz:
http://arxiv.org/abs/2206.05404
Non-stationarity is ubiquitous in human behavior and addressing it in the contextual bandits is challenging. Several works have addressed the problem by investigating semi-parametric contextual bandits and warned that ignoring non-stationarity could
Externí odkaz:
http://arxiv.org/abs/2205.08295
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chosen arm and the rewards of other arms remain missing. The dependence of the arm choice on the past context and reward pairs compounds the complexity of
Externí odkaz:
http://arxiv.org/abs/2102.01229
The Mixup method (Zhang et al. 2018), which uses linearly interpolated data, has emerged as an effective data augmentation tool to improve generalization performance and the robustness to adversarial examples. The motivation is to curtail undesirable
Externí odkaz:
http://arxiv.org/abs/2012.02521
Wasserstein distributionally robust optimization (WDRO) attempts to learn a model that minimizes the local worst-case risk in the vicinity of the empirical data distribution defined by Wasserstein ball. While WDRO has received attention as a promisin
Externí odkaz:
http://arxiv.org/abs/2006.03333
Publikováno v:
In Information Sciences October 2023 645