Zobrazeno 1 - 10
of 229
pro vyhledávání: '"Zheng Weiqiang"'
Many alignment methods, including reinforcement learning from human feedback (RLHF), rely on the Bradley-Terry reward assumption, which is insufficient to capture the full range of general human preferences. To achieve robust alignment with general p
Externí odkaz:
http://arxiv.org/abs/2410.23223
Autor:
Cai, Yang, Farina, Gabriele, Grand-Clément, Julien, Kroer, Christian, Lee, Chung-Wei, Luo, Haipeng, Zheng, Weiqiang
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games, both in theory and practice. Particularly popular algorithms include optimistic multiplicative weights update (OMWU) and optimistic gradient-desc
Externí odkaz:
http://arxiv.org/abs/2406.10631
Publikováno v:
Journal of Engineering Science and Technology Review, Vol 11, Iss 6, Pp 107-115 (2018)
Externí odkaz:
https://doaj.org/article/a802a73398864f4db2f30f2c11903234
While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to a coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when utilities are n
Externí odkaz:
http://arxiv.org/abs/2403.08171
We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T^{-3/4})$ convergence rat
Externí odkaz:
http://arxiv.org/abs/2401.15240
In this paper, we investigate a problem of actively learning threshold in latent space, where the unknown reward $g(\gamma, v)$ depends on the proposed threshold $\gamma$ and latent value $v$ and it can be $only$ achieved if the threshold is lower th
Externí odkaz:
http://arxiv.org/abs/2312.04653
Autor:
Cai, Yang, Farina, Gabriele, Grand-Clément, Julien, Kroer, Christian, Lee, Chung-Wei, Luo, Haipeng, Zheng, Weiqiang
Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice. Unlike algorithms such as optimistic gradient descent asce
Externí odkaz:
http://arxiv.org/abs/2311.00676
We revisit the problem of learning in two-player zero-sum Markov games, focusing on developing an algorithm that is uncoupled, convergent, and rational, with non-asymptotic convergence rates. We start from the case of stateless matrix game with bandi
Externí odkaz:
http://arxiv.org/abs/2303.02738
Autor:
Cai, Yang, Zheng, Weiqiang
We consider online learning in multi-player smooth monotone games. Existing algorithms have limitations such as (1) being only applicable to strongly monotone games; (2) lacking the no-regret guarantee; (3) having only asymptotic or slow $O(\frac{1}{
Externí odkaz:
http://arxiv.org/abs/2301.13120
Autor:
Xia, Lirong, Zheng, Weiqiang
The computational complexity of winner determination is a classical and important problem in computational social choice. Previous work based on worst-case analysis has established NP-hardness of winner determination for some classic voting rules, su
Externí odkaz:
http://arxiv.org/abs/2210.08173