Zobrazeno 1 - 6
of 6
pro vyhledávání: '"Soffair, Nitsan"'
Autor:
Soffair, Nitsan, Katz, Gilad
Discounted algorithms often encounter evaluation errors due to their reliance on short-term estimations, which can impede their efficacy in addressing simple, short-term tasks and impose undesired temporal discounts (\(\gamma\)). Interestingly, these
Externí odkaz:
http://arxiv.org/abs/2405.00877
Autor:
Soffair, Nitsan, Mannor, Shie
DDPG is hindered by the overestimation bias problem, wherein its $Q$-estimates tend to overstate the actual $Q$-values. Traditional solutions to this bias involve ensemble-based methods, which require significant computational resources, or complex l
Externí odkaz:
http://arxiv.org/abs/2403.05732
Std $Q$-target is a conservative, actor-critic, ensemble, $Q$-learning-based algorithm, which is based on a single key $Q$-formula: $Q$-networks standard deviation, which is an "uncertainty penalty", and, serves as a minimalistic solution to the prob
Externí odkaz:
http://arxiv.org/abs/2402.05950
Autor:
Soffair, Nitsan, Mannor, Shie
MinMaxMin $Q$-learning is a novel optimistic Actor-Critic algorithm that addresses the problem of overestimation bias ($Q$-estimations are overestimating the real $Q$-values) inherent in conservative RL algorithms. Its core formula relies on the disa
Externí odkaz:
http://arxiv.org/abs/2402.05951
Autor:
Soffair, Nitsan
The SOTA algorithms for addressing QDec-POMDP issues, QDec-FP and QDec-FPS, are unable to effectively tackle problems that involve different types of sensing agents. We propose a new algorithm that addresses this issue by requiring agents to adopt th
Externí odkaz:
http://arxiv.org/abs/2301.01246
Autor:
Soffair, Nitsan
WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the sec
Externí odkaz:
http://arxiv.org/abs/2211.15411