Zobrazeno 1 - 10
of 371
pro vyhledávání: '"A, Gimelfarb"'
We propose Constraint-Generation Policy Optimization (CGPO) for optimizing policy parameters within compact and interpretable policy classes for mixed discrete-continuous Markov Decision Processes (DC-MDPs). CGPO is not only able to provide bounded p
Externí odkaz:
http://arxiv.org/abs/2401.12243
Autor:
Gimelfarb, Michael, Kim, Michael Jong
We study parameterized MDPs (PMDPs) in which the key parameters of interest are unknown and must be learned using Bayesian inference. One key defining feature of such models is the presence of "uninformative" actions that provide no information about
Externí odkaz:
http://arxiv.org/abs/2305.07844
Publikováno v:
Croatian Medical Journal. Oct2024, Vol. 65 Issue 5, p431-439. 9p.
Autor:
Taitler, Ayal, Gimelfarb, Michael, Jeong, Jihwan, Gopalakrishnan, Sriram, Mladenov, Martin, Liu, Xiaotian, Sanner, Scott
We present pyRDDLGym, a Python framework for auto-generation of OpenAI Gym environments from RDDL declerative description. The discrete time step evolution of variables in RDDL is described by conditional probability functions, which fits naturally i
Externí odkaz:
http://arxiv.org/abs/2211.05939
Autor:
Jeong, Jihwan, Wang, Xiaoyu, Gimelfarb, Michael, Kim, Hyunwoo, Abdulhai, Baher, Sanner, Scott
Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since they can
Externí odkaz:
http://arxiv.org/abs/2210.03802
Publikováno v:
In World Neurosurgery January 2025 193:119-130
Autor:
Igor Kisil, Yuri Gimelfarb
Publikováno v:
Journal of Yeungnam Medical Science, Vol 40, Iss 4, Pp 364-372 (2023)
Background Growing evidence suggests that beta-hydroxy-beta-methylbutyrate (HMB), arginine (Arg), and glutamine (Gln) positively affect wound recovery. This study investigated the effects of long-term administration of HMB/Arg/Gln on pressure ulcer (
Externí odkaz:
https://doaj.org/article/b61b4f4ccd16453297f4915bb717eb81
Planning provides a framework for optimizing sequential decisions in complex environments. Recent advances in efficient planning in deterministic or stochastic high-dimensional domains with continuous action spaces leverage backpropagation through a
Externí odkaz:
http://arxiv.org/abs/2106.07260
Sample efficiency and risk-awareness are central to the development of practical reinforcement learning (RL) for complex decision-making. The former can be addressed by transfer learning and the latter by optimizing some utility function of the retur
Externí odkaz:
http://arxiv.org/abs/2105.14127
Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which des
Externí odkaz:
http://arxiv.org/abs/2007.00869