Výsledky vyhledávání

Report

Boosting Perturbed Gradient Ascent for Last-Iterate Convergence in Games

Autor: Abe, Kenshi, Sakamoto, Mitsuki, Ariu, Kaito, Iwasaki, Atsushi

This paper presents a payoff perturbation technique, introducing a strong convexity to players' payoff functions in games. This technique is specifically designed for first-order methods to achieve last-iterate convergence in games where the gradient

Externí odkaz: http://arxiv.org/abs/2410.02388

Zobrazit plný text záznamu

Report

Synchronization behind Learning in Periodic Zero-Sum Games Triggers Divergence from Nash equilibrium

Autor: Fujimoto, Yuma, Ariu, Kaito, Abe, Kenshi

Learning in zero-sum games studies a situation where multiple agents competitively learn their strategy. In such multi-agent learning, we often see that the strategies cycle around their optimum, i.e., Nash equilibrium. When a game periodically varie

Externí odkaz: http://arxiv.org/abs/2408.10595

Zobrazit plný text záznamu

Report

Matroid Semi-Bandits in Sublinear Time

Autor: Tzeng, Ruo-Chun, Ohsaka, Naoto, Ariu, Kaito

We study the matroid semi-bandits problem, where at each round the learner plays a subset of $K$ arms from a feasible set, and the goal is to maximize the expected cumulative linear rewards. Existing algorithms have per-round time complexity at least

Externí odkaz: http://arxiv.org/abs/2405.17968

Zobrazit plný text záznamu

Report

Global Behavior of Learning Dynamics in Zero-Sum Games with Memory Asymmetry

Autor: Fujimoto, Yuma, Ariu, Kaito, Abe, Kenshi

This study examines the global behavior of dynamics in learning in games between two players, X and Y. We consider the simplest situation for memory asymmetry between two players: X memorizes the other Y's previous action and uses reactive strategies

Externí odkaz: http://arxiv.org/abs/2405.14546

Zobrazit plný text záznamu

Report

Filtered Direct Preference Optimization

Autor: Morimura, Tetsuro, Sakamoto, Mitsuki, Jinnai, Yuu, Abe, Kenshi, Ariu, Kaito

Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning language models with human preferences. While the significance of dataset quality is generally recognized, explicit investigations into its impact within the RLHF fram

Externí odkaz: http://arxiv.org/abs/2404.13846

Zobrazit plný text záznamu

Report

Regularized Best-of-N Sampling to Mitigate Reward Hacking for Language Model Alignment

Autor: Jinnai, Yuu, Morimura, Tetsuro, Ariu, Kaito, Abe, Kenshi

Best-of-N (BoN) sampling with a reward model has been shown to be an effective strategy for aligning Large Language Models (LLMs) to human preferences at the time of decoding. BoN sampling is susceptible to a problem known as reward hacking. Because

Externí odkaz: http://arxiv.org/abs/2404.01054

Zobrazit plný text záznamu

Report

Nash Equilibrium and Learning Dynamics in Three-Player Matching $m$-Action Games

Autor: Fujimoto, Yuma, Ariu, Kaito, Abe, Kenshi

Learning in games discusses the processes where multiple players learn their optimal strategies through the repetition of game plays. The dynamics of learning between two players in zero-sum games, such as matching pennies, where their benefits are c

Externí odkaz: http://arxiv.org/abs/2402.10825

Zobrazit plný text záznamu

Report

Return-Aligned Decision Transformer

Autor: Tanaka, Tsunehiko, Abe, Kenshi, Ariu, Kaito, Morimura, Tetsuro, Simo-Serra, Edgar

Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. However, as applications broaden, it becomes increasingly crucial to train agents that not only maxim

Externí odkaz: http://arxiv.org/abs/2402.03923

Zobrazit plný text záznamu

Report

Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding

Autor: Jinnai, Yuu, Ariu, Kaito

Minimum Bayes-Risk (MBR) decoding is shown to be a powerful alternative to beam search decoding for a wide range of text generation tasks. However, MBR requires a huge amount of time for inference to compute the MBR objective, which makes the method

Externí odkaz: http://arxiv.org/abs/2401.02749

Zobrazit plný text záznamu

Report

Model-Based Minimum Bayes Risk Decoding for Text Generation

Autor: Jinnai, Yuu, Morimura, Tetsuro, Honda, Ukyo, Ariu, Kaito, Abe, Kenshi

Minimum Bayes Risk (MBR) decoding has been shown to be a powerful alternative to beam search decoding in a variety of text generation tasks. MBR decoding selects a hypothesis from a pool of hypotheses that has the least expected risk under a probabil

Externí odkaz: http://arxiv.org/abs/2311.05263

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání