Výsledky vyhledávání - "P, Courville"

Report

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Autor: Noukhovitch, Michael, Huang, Shengyi, Xhonneux, Sophie, Hosseini, Arian, Agarwal, Rishabh, Courville, Aaron

The dominant paradigm for RLHF is online and on-policy RL: synchronously generating from the large language model (LLM) policy, labelling with a reward model, and learning using feedback on the LLM's own outputs. While performant, this paradigm is co

Externí odkaz: http://arxiv.org/abs/2410.18252

Zobrazit plný text záznamu

Report

Stick-breaking Attention

Autor: Tan, Shawn, Shen, Yikang, Yang, Songlin, Courville, Aaron, Panda, Rameswar

The self-attention mechanism traditionally relies on the softmax operator, necessitating positional embeddings like RoPE, or position biases to account for token order. But current methods using still face length generalisation challenges. We propose

Externí odkaz: http://arxiv.org/abs/2410.17980

Zobrazit plný text záznamu

Report

Neuroplastic Expansion in Deep Reinforcement Learning

Autor: Liu, Jiashun, Obando-Ceron, Johan, Courville, Aaron, Pan, Ling

The loss of plasticity in learning agents, analogous to the solidification of neural pathways in biological brains, significantly impedes learning and adaptation in reinforcement learning due to its non-stationary nature. To address this fundamental

Externí odkaz: http://arxiv.org/abs/2410.07994

Zobrazit plný text záznamu

Report

Not All LLM Reasoners Are Created Equal

Autor: Hosseini, Arian, Sordoni, Alessandro, Toyama, Daniel, Courville, Aaron, Agarwal, Rishabh

We study the depth of grade-school math (GSM) problem-solving capabilities of LLMs. To this end, we evaluate their performance on pairs of existing math word problems together so that the answer to the second problem depends on correctly answering th

Externí odkaz: http://arxiv.org/abs/2410.01748

Zobrazit plný text záznamu

Report

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Autor: Kazemnejad, Amirhossein, Aghajohari, Milad, Portelance, Eva, Sordoni, Alessandro, Reddy, Siva, Courville, Aaron, Roux, Nicolas Le

Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receiving any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal

Externí odkaz: http://arxiv.org/abs/2410.01679

Zobrazit plný text záznamu

Report

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

Autor: Sokar, Ghada, Obando-Ceron, Johan, Courville, Aaron, Larochelle, Hugo, Castro, Pablo Samuel

The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While soft mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reaso

Externí odkaz: http://arxiv.org/abs/2410.01930

Zobrazit plný text záznamu

Report

Managing multiple agents by automatically adjusting incentives

Autor: Akatsuka, Shunichi, Teramoto, Yaemi, Courville, Aaron

In the coming years, AI agents will be used for making more complex decisions, including in situations involving many different groups of people. One big challenge is that AI agent tends to act in its own interest, unlike humans who often think about

Externí odkaz: http://arxiv.org/abs/2409.02960

Zobrazit plný text záznamu

Report

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

Autor: Nguyen, Bac, Uhlich, Stefan, Cardinaux, Fabien, Mauch, Lukas, Edraki, Marzieh, Courville, Aaron

Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-s

Externí odkaz: http://arxiv.org/abs/2407.03036

Zobrazit plný text záznamu

Report

On the consistency of hyper-parameter selection in value-based deep reinforcement learning

Autor: Obando-Ceron, Johan, Araújo, João G. M., Courville, Aaron, Castro, Pablo Samuel

Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements bu

Externí odkaz: http://arxiv.org/abs/2406.17523

Zobrazit plný text záznamu

Report

GenRL: Multimodal-foundation world models for generalization in embodied agents

Autor: Mazzaglia, Pietro, Verbelen, Tim, Dhoedt, Bart, Courville, Aaron, Rajeswar, Sai

Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning (RL) is hard to scale up as it requires a complex reward design for each task. In contrast, language can sp

Externí odkaz: http://arxiv.org/abs/2406.18043

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání