Zobrazeno 1 - 10
of 2 757
pro vyhledávání: '"Policy gradient method"'
This paper studies the synthesis of an active perception policy that maximizes the information leakage of the initial state in a stochastic system modeled as a hidden Markov model (HMM). Specifically, the emission function of the HMM is controllable
Externí odkaz:
http://arxiv.org/abs/2409.16439
Autor:
Hultin, Hanna1 hhultin@kth.se, Hult, Henrik2 hult@kth.se, Proutiere, Alexandre3 alepro@kth.se, Samama, Samuel4 samuel.samama@seb.se, Tarighati, Ala5 ala.tarighati@seb.se
Publikováno v:
Journal of Financial Data Science. Summer2024, Vol. 6 Issue 3, p81-114. 34p.
Robust Markov Decision Processes (RMDPs) have recently been recognized as a valuable and promising approach to discovering a policy with creditable performance, particularly in the presence of a dynamic environment and estimation errors in the transi
Externí odkaz:
http://arxiv.org/abs/2406.00274
Policy-based methods have achieved remarkable success in solving challenging reinforcement learning problems. Among these methods, off-policy policy gradient methods are particularly important due to that they can benefit from off-policy data. Howeve
Externí odkaz:
http://arxiv.org/abs/2405.02572
Autor:
Zhang, Junyue, Mu, Yifen
Despite the significant potential for various applications, stochastic games with long-run average payoffs have received limited scholarly attention, particularly concerning the development of learning algorithms for them due to the challenges of mat
Externí odkaz:
http://arxiv.org/abs/2405.09811
Publikováno v:
Command Control & Simulation / Zhihui Kongzhi yu Fangzhen. Oct2024, Vol. 46 Issue 5, p37-44. 8p.
Autor:
Sadamoto, Tomonori, Nakamata, Fumiya
We present a model-based globally convergent policy gradient method (PGM) for linear quadratic Gaussian (LQG) control. Firstly, we establish equivalence between optimizing dynamic output feedback controllers and designing a static feedback gain for a
Externí odkaz:
http://arxiv.org/abs/2312.12173
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
HU Zhengyang, WANG Yong
Publikováno v:
Zhihui kongzhi yu fangzhen, Vol 46, Iss 5, Pp 37-44 (2024)
This research addresses the crucial problem of collision avoidance decision making for autonomous ships under diverse encounter situations. Building upon the Deep Deterministic Policy Gradient (DDPG) algorithm, appropriate reward functions based on t
Externí odkaz:
https://doaj.org/article/da6e201d345345e9ab63214c61685904
Binary optimization has a wide range of applications in combinatorial optimization problems such as MaxCut, MIMO detection, and MaxSAT. However, these problems are typically NP-hard due to the binary constraints. We develop a novel probabilistic mode
Externí odkaz:
http://arxiv.org/abs/2307.00783