Zobrazeno 1 - 10
of 44
pro vyhledávání: '"Abe, Kenshi"'
Auction is one of the most representative buying-selling systems. A celebrated study shows that the seller's expected revenue is equal in equilibrium, regardless of the type of auction, typically first-price and second-price auctions. Here, however,
Externí odkaz:
http://arxiv.org/abs/2410.12306
Mean Field Game (MFG) is a framework utilized to model and approximate the behavior of a large number of agents, and the computation of equilibria in MFG has been a subject of interest. Despite the proposal of methods to approximate the equilibria, a
Externí odkaz:
http://arxiv.org/abs/2410.05127
This paper presents a payoff perturbation technique, introducing a strong convexity to players' payoff functions in games. This technique is specifically designed for first-order methods to achieve last-iterate convergence in games where the gradient
Externí odkaz:
http://arxiv.org/abs/2410.02388
Synchronization behind Learning in Periodic Zero-Sum Games Triggers Divergence from Nash equilibrium
Learning in zero-sum games studies a situation where multiple agents competitively learn their strategy. In such multi-agent learning, we often see that the strategies cycle around their optimum, i.e., Nash equilibrium. When a game periodically varie
Externí odkaz:
http://arxiv.org/abs/2408.10595
This study examines the global behavior of dynamics in learning in games between two players, X and Y. We consider the simplest situation for memory asymmetry between two players: X memorizes the other Y's previous action and uses reactive strategies
Externí odkaz:
http://arxiv.org/abs/2405.14546
Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning language models with human preferences. While the significance of dataset quality is generally recognized, explicit investigations into its impact within the RLHF fram
Externí odkaz:
http://arxiv.org/abs/2404.13846
Best-of-N (BoN) sampling with a reward model has been shown to be an effective strategy for aligning Large Language Models (LLMs) to human preferences at the time of decoding. BoN sampling is susceptible to a problem known as reward hacking. Because
Externí odkaz:
http://arxiv.org/abs/2404.01054
Typical recommendation and ranking methods aim to optimize the satisfaction of users, but they are often oblivious to their impact on the items (e.g., products, jobs, news, video) and their providers. However, there has been a growing understanding t
Externí odkaz:
http://arxiv.org/abs/2402.14369
Learning in games discusses the processes where multiple players learn their optimal strategies through the repetition of game plays. The dynamics of learning between two players in zero-sum games, such as matching pennies, where their benefits are c
Externí odkaz:
http://arxiv.org/abs/2402.10825
Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. However, as applications broaden, it becomes increasingly crucial to train agents that not only maxim
Externí odkaz:
http://arxiv.org/abs/2402.03923