Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Nauman, Michal"'
Visual perspective-taking (VPT), the ability to understand the viewpoint of another person, enables individuals to anticipate the actions of other people. For instance, a driver can avoid accidents by assessing what pedestrians see. Humans typically
Externí odkaz:
http://arxiv.org/abs/2409.12969
Sample efficiency in Reinforcement Learning (RL) has traditionally been driven by algorithmic enhancements. In this work, we demonstrate that scaling can also lead to substantial improvements. We conduct a thorough investigation into the interplay of
Externí odkaz:
http://arxiv.org/abs/2405.16158
Autor:
Nauman, Michal, Bortkiewicz, Michał, Miłoś, Piotr, Trzciński, Tomasz, Ostaszewski, Mateusz, Cygan, Marek
Recent advancements in off-policy Reinforcement Learning (RL) have significantly improved sample efficiency, primarily due to the incorporation of various forms of regularization that enable more gradient update steps than traditional agents. However
Externí odkaz:
http://arxiv.org/abs/2403.00514
In this paper, we investigate the issue of error accumulation in critic networks updated via pessimistic temporal difference objectives. We show that the critic approximation error can be approximated via a recursive fixed-point model similar to that
Externí odkaz:
http://arxiv.org/abs/2403.01014
Autor:
Nauman, Michal, Cygan, Marek
Risk-aware Reinforcement Learning (RL) algorithms like SAC and TD3 were shown empirically to outperform their risk-neutral counterparts in a variety of continuous-action tasks. However, the theoretical basis for the pessimistic objectives these algor
Externí odkaz:
http://arxiv.org/abs/2310.19527
Autor:
Nauman, Michal, Cygan, Marek
Publikováno v:
Proceedings of the 40th International Conference on Machine Learning, PMLR 202:25769-25789, 2023
We study the variance of stochastic policy gradients (SPGs) with many action samples per state. We derive a many-actions optimality condition, which determines when many-actions SPG yields lower variance as compared to a single-action agent with prop
Externí odkaz:
http://arxiv.org/abs/2210.13011
Autor:
Nauman, Michal, Hengst, Floris Den
In this paper, we propose World Model Policy Gradient (WMPG), an approach to reduce the variance of policy gradient estimates using learned world models (WM's). In WMPG, a WM is trained online and used to imagine trajectories. The imagined trajectori
Externí odkaz:
http://arxiv.org/abs/2010.15622