Zobrazeno 1 - 10
of 495
pro vyhledávání: '"KITAMURA, TOSHINORI"'
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Autor:
Kitamura, Toshinori, Kozuno, Tadashi, Kumagai, Wataru, Hoshino, Kenta, Hosoe, Yohei, Kasaura, Kazumi, Hamaya, Masashi, Parmas, Paavo, Matsuo, Yutaka
Designing a safe policy for uncertain environments is crucial in real-world control applications. However, this challenge remains inadequately addressed within the Markov decision process (MDP) framework. This paper presents the first algorithm capab
Externí odkaz:
http://arxiv.org/abs/2408.16286
Autor:
Kitamura, Toshinori, Kozuno, Tadashi, Kato, Masahiro, Ichihara, Yuki, Nishimori, Soichiro, Sannai, Akiyoshi, Sonoda, Sho, Kumagai, Wataru, Matsuo, Yutaka
We study a primal-dual (PD) reinforcement learning (RL) algorithm for online constrained Markov decision processes (CMDPs). Despite its widespread practical use, the existing theoretical literature on PD-RL algorithms for this problem only provides s
Externí odkaz:
http://arxiv.org/abs/2401.17780
Autor:
Kitamura, Toshinori, Kozuno, Tadashi, Tang, Yunhao, Vieillard, Nino, Valko, Michal, Yang, Wenhao, Mei, Jincheng, Ménard, Pierre, Azar, Mohammad Gheshlaghi, Munos, Rémi, Pietquin, Olivier, Geist, Matthieu, Szepesvári, Csaba, Kumagai, Wataru, Matsuo, Yutaka
Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms. However, despite the use of function appro
Externí odkaz:
http://arxiv.org/abs/2305.13185
Autor:
Kozuno, Tadashi, Yang, Wenhao, Vieillard, Nino, Kitamura, Toshinori, Tang, Yunhao, Mei, Jincheng, Ménard, Pierre, Azar, Mohammad Gheshlaghi, Valko, Michal, Munos, Rémi, Pietquin, Olivier, Geist, Matthieu, Szepesvári, Csaba
In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses t
Externí odkaz:
http://arxiv.org/abs/2205.14211
Autor:
Kitamura, Toshinori, Yonetani, Ryo
We present ShinRL, an open-source library specialized for the evaluation of reinforcement learning (RL) algorithms from both theoretical and practical perspectives. Existing RL libraries typically allow users to evaluate practical performances of dee
Externí odkaz:
http://arxiv.org/abs/2112.04123
The recent boom in the literature on entropy-regularized reinforcement learning (RL) approaches reveals that Kullback-Leibler (KL) regularization brings advantages to RL algorithms by canceling out errors under mild assumptions. However, existing ana
Externí odkaz:
http://arxiv.org/abs/2107.07659
In this paper, we propose cautious policy programming (CPP), a novel value-based reinforcement learning (RL) algorithm that can ensure monotonic policy improvement during learning. Based on the nature of entropy-regularized RL, we derive a new entrop
Externí odkaz:
http://arxiv.org/abs/2107.05798
The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a novel off-poli
Externí odkaz:
http://arxiv.org/abs/2107.05217
Autor:
Kitamura, Toshinori1,2,3,4 (AUTHOR) kitamura@institute-of-mental-health.jp, Hada, Ayako1,2,5,6 (AUTHOR) hada@institute-of-mental-health.jp, Usui, Yuriko1,2,7 (AUTHOR) y-ohashi@jiu.ac.jp, Ohashi, Yukiko1,3,8 (AUTHOR)
Publikováno v:
Behavioral Sciences (2076-328X). Jul2024, Vol. 14 Issue 7, p576. 20p.