Výsledky vyhledávání - "Rame, Alexandre"

Report

WARP: On the Benefits of Weight Averaged Rewarded Policies

Autor: Ramé, Alexandre, Ferret, Johan, Vieillard, Nino, Dadashi, Robert, Hussenot, Léonard, Cedoz, Pierre-Louis, Sessa, Pier Giuseppe, Girgin, Sertan, Douillard, Arthur, Bachem, Olivier

Reinforcement learning from human feedback (RLHF) aligns large language models (LLMs) by encouraging their generations to have high rewards, using a reward model trained on human preferences. To prevent the forgetting of pre-trained knowledge, RLHF u

Externí odkaz: http://arxiv.org/abs/2406.16768

Zobrazit plný text záznamu

Report

Direct Language Model Alignment from Online AI Feedback

Autor: Guo, Shangmin, Zhang, Biao, Liu, Tianlin, Liu, Tianqi, Khalman, Misha, Llinares, Felipe, Rame, Alexandre, Mesnard, Thomas, Zhao, Yao, Piot, Bilal, Ferret, Johan, Blondel, Mathieu

Direct alignment from preferences (DAP) methods, such as DPO, have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF), that do not require a separate reward model. However, the preference datasets used in

Externí odkaz: http://arxiv.org/abs/2402.04792

Zobrazit plný text záznamu

Report

WARM: On the Benefits of Weight Averaged Reward Models

Autor: Ramé, Alexandre, Vieillard, Nino, Hussenot, Léonard, Dadashi, Robert, Cideron, Geoffrey, Bachem, Olivier, Ferret, Johan

Aligning large language models (LLMs) with human preferences through reinforcement learning (RLHF) can lead to reward hacking, where LLMs exploit failures in the reward model (RM) to achieve seemingly high rewards without meeting the underlying objec

Externí odkaz: http://arxiv.org/abs/2401.12187

Zobrazit plný text záznamu

Report

Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning

Autor: Shukor, Mustafa, Rame, Alexandre, Dancette, Corentin, Cord, Matthieu

Following the success of Large Language Models (LLMs), Large Multimodal Models (LMMs), such as the Flamingo model and its subsequent competitors, have started to emerge as natural steps towards generalist agents. However, interacting with recent LMMs

Externí odkaz: http://arxiv.org/abs/2310.00647

Zobrazit plný text záznamu

Report

UnIVAL: Unified Model for Image, Video, Audio and Language Tasks

Autor: Shukor, Mustafa, Dancette, Corentin, Rame, Alexandre, Cord, Matthieu

Large Language Models (LLMs) have made the ambitious quest for generalist agents significantly far from being a fantasy. A key hurdle for building such general models is the diversity and heterogeneity of tasks and modalities. A promising solution is

Externí odkaz: http://arxiv.org/abs/2307.16184

Zobrazit plný text záznamu

Report

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Autor: Ramé, Alexandre, Couairon, Guillaume, Shukor, Mustafa, Dancette, Corentin, Gaya, Jean-Baptiste, Soulier, Laure, Cord, Matthieu

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfections in the

Externí odkaz: http://arxiv.org/abs/2306.04488

Zobrazit plný text záznamu

Report

Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization

Autor: Ramé, Alexandre, Ahuja, Kartik, Zhang, Jianyu, Cord, Matthieu, Bottou, Léon, Lopez-Paz, David

Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions: from a pre-trained foundation model, they fine-tune the weights on the target task of interest. So, th

Externí odkaz: http://arxiv.org/abs/2212.10445

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání