Zobrazeno 1 - 10
of 39
pro vyhledávání: '"Bai, Qinbo"'
Publikováno v:
Foundations and Trends in Optimization: Vol. 6: No. 4, pp 193-298, 2024
Reinforcement Learning (RL) serves as a versatile framework for sequential decision-making, finding applications across diverse domains such as robotics, autonomous driving, recommendation systems, supply chain optimization, biology, mechanics, and f
Externí odkaz:
http://arxiv.org/abs/2406.11481
Publikováno v:
NeurIPS 2024
This paper explores the realm of infinite horizon average reward Constrained Markov Decision Processes (CMDPs). To the best of our knowledge, this work is the first to delve into the regret and constraint violation analysis of average reward CMDPs wi
Externí odkaz:
http://arxiv.org/abs/2402.02042
Publikováno v:
AAAI 2024
In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Distinguishing itself from existing works within this context, our approach harnesses the power of the general policy gradient-based algorithm, liberating it
Externí odkaz:
http://arxiv.org/abs/2309.01922
We consider the problem of constrained Markov decision process (CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel Conservative Natural Policy Gradient
Externí odkaz:
http://arxiv.org/abs/2206.05850
Autor:
Kang, Shufang, Bai, Qinbo, Qin, Yana, Liang, Qiuhong, Hu, Yayun, Li, Shengkai, Luan, Guangzhong
Publikováno v:
In Food Research International November 2024 196
Publikováno v:
AAAI 2022
Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety constrain
Externí odkaz:
http://arxiv.org/abs/2109.06332
Publikováno v:
Transactions on Machine Learning Research, Dec 2022
We consider the problem of tabular infinite horizon concave utility reinforcement learning (CURL) with convex constraints. For this, we propose a model-based learning algorithm that also achieves zero constraint violations. Assuming that the concave
Externí odkaz:
http://arxiv.org/abs/2109.05439
We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are $K$ cost functions. The agent aims to maximiz
Externí odkaz:
http://arxiv.org/abs/2106.06680
Many engineering problems have multiple objectives, and the overall aim is to optimize a non-linear function of these objectives. In this paper, we formulate the problem of maximizing a non-linear concave function of multiple long-term objectives. A
Externí odkaz:
http://arxiv.org/abs/2105.14125
In the optimization of dynamical systems, the variables typically have constraints. Such problems can be modeled as a constrained Markov Decision Process (CMDP). This paper considers a model-free approach to the problem, where the transition probabil
Externí odkaz:
http://arxiv.org/abs/2006.05961