Výsledky vyhledávání - "Pietquin, Olivier"

Report

Averaging log-likelihoods in direct alignment

Autor: Grinsztajn, Nathan, Flet-Berliac, Yannis, Azar, Mohammad Gheshlaghi, Strub, Florian, Wu, Bill, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Pietquin, Olivier, Geist, Matthieu

To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a f

Externí odkaz: http://arxiv.org/abs/2406.19188

Zobrazit plný text záznamu

Report

Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

Autor: Flet-Berliac, Yannis, Grinsztajn, Nathan, Strub, Florian, Choi, Eugene, Cremer, Chris, Ahmadian, Arash, Chandak, Yash, Azar, Mohammad Gheshlaghi, Pietquin, Olivier, Geist, Matthieu

Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more

Externí odkaz: http://arxiv.org/abs/2406.19185

Zobrazit plný text záznamu

Report

Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

Autor: Rita, Mathieu, Strub, Florian, Chaabouni, Rahma, Michel, Paul, Dupoux, Emmanuel, Pietquin, Olivier

While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyper

Externí odkaz: http://arxiv.org/abs/2404.19409

Zobrazit plný text záznamu

Report

Language Evolution with Deep Learning

Autor: Rita, Mathieu, Michel, Paul, Chaabouni, Rahma, Pietquin, Olivier, Dupoux, Emmanuel, Strub, Florian

Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language within a simulated controlled environment. Several

Externí odkaz: http://arxiv.org/abs/2403.11958

Zobrazit plný text záznamu

Report

Population-aware Online Mirror Descent for Mean-Field Games by Deep Reinforcement Learning

Autor: Wu, Zida, Lauriere, Mathieu, Chua, Samuel Jia Cong, Geist, Matthieu, Pietquin, Olivier, Mehta, Ankur

Mean Field Games (MFGs) have the ability to handle large-scale multi-agent systems, but learning Nash equilibria in MFGs remains a challenging task. In this paper, we propose a deep reinforcement learning (DRL) algorithm that achieves population-depe

Externí odkaz: http://arxiv.org/abs/2403.03552

Zobrazit plný text záznamu

Report

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Autor: Ahmadian, Arash, Cremer, Chris, Gallé, Matthias, Fadaee, Marzieh, Kreutzer, Julia, Pietquin, Olivier, Üstün, Ahmet, Hooker, Sara

AI alignment in the shape of Reinforcement Learning from Human Feedback (RLHF) is increasingly treated as a crucial ingredient for high performance large language models. Proximal Policy Optimization (PPO) has been positioned by recent literature as

Externí odkaz: http://arxiv.org/abs/2402.14740

Zobrazit plný text záznamu

Report

MusicRL: Aligning Music Generation to Human Preferences

Autor: Cideron, Geoffrey, Girgin, Sertan, Verzetti, Mauro, Vincent, Damien, Kastelic, Matej, Borsos, Zalán, McWilliams, Brian, Ungureanu, Victor, Bachem, Olivier, Pietquin, Olivier, Geist, Matthieu, Hussenot, Léonard, Zeghidour, Neil, Agostinelli, Andrea

We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent

Externí odkaz: http://arxiv.org/abs/2402.04229

Zobrazit plný text záznamu

Report

Learning Discrete-Time Major-Minor Mean Field Games

Autor: Cui, Kai, Dayanıklı, Gökçe, Laurière, Mathieu, Geist, Matthieu, Pietquin, Olivier, Koeppl, Heinz

Recent techniques based on Mean Field Games (MFGs) allow the scalable analysis of multi-player games with many similar, rational agents. However, standard MFGs remain limited to homogeneous players that weakly influence each other, and cannot model m

Externí odkaz: http://arxiv.org/abs/2312.10787

Zobrazit plný text záznamu

Report

A Survey of Temporal Credit Assignment in Deep Reinforcement Learning

Autor: Pignatelli, Eduardo, Ferret, Johan, Geist, Matthieu, Mesnard, Thomas, van Hasselt, Hado, Pietquin, Olivier, Toni, Laura

The Credit Assignment Problem (CAP) refers to the longstanding challenge of Reinforcement Learning (RL) agents to associate actions with their long-term consequences. Solving the CAP is a crucial step towards the successful deployment of RL in the re

Externí odkaz: http://arxiv.org/abs/2312.01072

Zobrazit plný text záznamu

Report

On Imitation in Mean-field Games

Autor: Ramponi, Giorgia, Kolev, Pavel, Pietquin, Olivier, He, Niao, Laurière, Mathieu, Geist, Matthieu

We explore the problem of imitation learning (IL) in the context of mean-field games (MFGs), where the goal is to imitate the behavior of a population of agents following a Nash equilibrium policy according to some unknown payoff function. IL in MFGs

Externí odkaz: http://arxiv.org/abs/2306.14799

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání