Zobrazeno 1 - 10
of 312
pro vyhledávání: '"Pietquin, Olivier"'
Autor:
Grinsztajn, Nathan, Flet-Berliac, Yannis, Azar, Mohammad Gheshlaghi, Strub, Florian, Wu, Bill, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Pietquin, Olivier, Geist, Matthieu
To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a f
Externí odkaz:
http://arxiv.org/abs/2406.19188
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion
Autor:
Flet-Berliac, Yannis, Grinsztajn, Nathan, Strub, Florian, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Azar, Mohammad Gheshlaghi, Pietquin, Olivier, Geist, Matthieu
Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more
Externí odkaz:
http://arxiv.org/abs/2406.19185
Autor:
Rita, Mathieu, Strub, Florian, Chaabouni, Rahma, Michel, Paul, Dupoux, Emmanuel, Pietquin, Olivier
While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyper
Externí odkaz:
http://arxiv.org/abs/2404.19409
Autor:
Rita, Mathieu, Michel, Paul, Chaabouni, Rahma, Pietquin, Olivier, Dupoux, Emmanuel, Strub, Florian
Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language within a simulated controlled environment. Several
Externí odkaz:
http://arxiv.org/abs/2403.11958
Autor:
Wu, Zida, Lauriere, Mathieu, Chua, Samuel Jia Cong, Geist, Matthieu, Pietquin, Olivier, Mehta, Ankur
Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-depe
Externí odkaz:
http://arxiv.org/abs/2403.03552
Autor:
Ahmadian, Arash, Cremer, Chris, Gallé, Matthias, Fadaee, Marzieh, Kreutzer, Julia, Pietquin, Olivier, Üstün, Ahmet, Hooker, Sara
AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been positioned by recent literature as
Externí odkaz:
http://arxiv.org/abs/2402.14740
Autor:
Cideron, Geoffrey, Girgin, Sertan, Verzetti, Mauro, Vincent, Damien, Kastelic, Matej, Borsos, Zalán, McWilliams, Brian, Ungureanu, Victor, Bachem, Olivier, Pietquin, Olivier, Geist, Matthieu, Hussenot, Léonard, Zeghidour, Neil, Agostinelli, Andrea
We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent
Externí odkaz:
http://arxiv.org/abs/2402.04229
Autor:
Cui, Kai, Dayanıklı, Gökçe, Laurière, Mathieu, Geist, Matthieu, Pietquin, Olivier, Koeppl, Heinz
Recent techniques based on Mean Field Games (MFGs) allow the scalable analysis of multi-player games with many similar, rational agents. However, standard MFGs remain limited to homogeneous players that weakly influence each other, and cannot model m
Externí odkaz:
http://arxiv.org/abs/2312.10787
Autor:
Pignatelli, Eduardo, Ferret, Johan, Geist, Matthieu, Mesnard, Thomas, van Hasselt, Hado, Pietquin, Olivier, Toni, Laura
The Credit Assignment Problem (CAP) refers to the longstanding challenge of Reinforcement Learning (RL) agents to associate actions with their long-term consequences. Solving the CAP is a crucial step towards the successful deployment of RL in the re
Externí odkaz:
http://arxiv.org/abs/2312.01072
Autor:
Ramponi, Giorgia, Kolev, Pavel, Pietquin, Olivier, He, Niao, Laurière, Mathieu, Geist, Matthieu
We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs), where the goal is to imitate the behavior of a population of agents following a Nash equilibrium policy according to some unknown payoff function. IL in MFGs
Externí odkaz:
http://arxiv.org/abs/2306.14799