Zobrazeno 1 - 10
of 63
pro vyhledávání: '"Ariu, Kaito"'
This paper presents a payoff perturbation technique, introducing a strong convexity to players' payoff functions in games. This technique is specifically designed for first-order methods to achieve last-iterate convergence in games where the gradient
Externí odkaz:
http://arxiv.org/abs/2410.02388
Synchronization behind Learning in Periodic Zero-Sum Games Triggers Divergence from Nash equilibrium
Learning in zero-sum games studies a situation where multiple agents competitively learn their strategy. In such multi-agent learning, we often see that the strategies cycle around their optimum, i.e., Nash equilibrium. When a game periodically varie
Externí odkaz:
http://arxiv.org/abs/2408.10595
We study the matroid semi-bandits problem, where at each round the learner plays a subset of $K$ arms from a feasible set, and the goal is to maximize the expected cumulative linear rewards. Existing algorithms have per-round time complexity at least
Externí odkaz:
http://arxiv.org/abs/2405.17968
This study examines the global behavior of dynamics in learning in games between two players, X and Y. We consider the simplest situation for memory asymmetry between two players: X memorizes the other Y's previous action and uses reactive strategies
Externí odkaz:
http://arxiv.org/abs/2405.14546
Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning language models with human preferences. While the significance of dataset quality is generally recognized, explicit investigations into its impact within the RLHF fram
Externí odkaz:
http://arxiv.org/abs/2404.13846
Best-of-N (BoN) sampling with a reward model has been shown to be an effective strategy for aligning Large Language Models (LLMs) to human preferences at the time of decoding. BoN sampling is susceptible to a problem known as reward hacking. Because
Externí odkaz:
http://arxiv.org/abs/2404.01054
Learning in games discusses the processes where multiple players learn their optimal strategies through the repetition of game plays. The dynamics of learning between two players in zero-sum games, such as matching pennies, where their benefits are c
Externí odkaz:
http://arxiv.org/abs/2402.10825
Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. However, as applications broaden, it becomes increasingly crucial to train agents that not only maxim
Externí odkaz:
http://arxiv.org/abs/2402.03923
Autor:
Jinnai, Yuu, Ariu, Kaito
Minimum Bayes-Risk (MBR) decoding is shown to be a powerful alternative to beam search decoding for a wide range of text generation tasks. However, MBR requires a huge amount of time for inference to compute the MBR objective, which makes the method
Externí odkaz:
http://arxiv.org/abs/2401.02749
Minimum Bayes Risk (MBR) decoding has been shown to be a powerful alternative to beam search decoding in a variety of text generation tasks. MBR decoding selects a hypothesis from a pool of hypotheses that has the least expected risk under a probabil
Externí odkaz:
http://arxiv.org/abs/2311.05263