Zobrazeno 1 - 10
of 3 203
pro vyhledávání: '"P, Courville"'
Autor:
Noukhovitch, Michael, Huang, Shengyi, Xhonneux, Sophie, Hosseini, Arian, Agarwal, Rishabh, Courville, Aaron
The dominant paradigm for RLHF is online and on-policy RL: synchronously generating from the large language model (LLM) policy, labelling with a reward model, and learning using feedback on the LLM's own outputs. While performant, this paradigm is co
Externí odkaz:
http://arxiv.org/abs/2410.18252
The self-attention mechanism traditionally relies on the softmax operator, necessitating positional embeddings like RoPE, or position biases to account for token order. But current methods using still face length generalisation challenges. We propose
Externí odkaz:
http://arxiv.org/abs/2410.17980
The loss of plasticity in learning agents, analogous to the solidification of neural pathways in biological brains, significantly impedes learning and adaptation in reinforcement learning due to its non-stationary nature. To address this fundamental
Externí odkaz:
http://arxiv.org/abs/2410.07994
We study the depth of grade-school math (GSM) problem-solving capabilities of LLMs. To this end, we evaluate their performance on pairs of existing math word problems together so that the answer to the second problem depends on correctly answering th
Externí odkaz:
http://arxiv.org/abs/2410.01748
Autor:
Kazemnejad, Amirhossein, Aghajohari, Milad, Portelance, Eva, Sordoni, Alessandro, Reddy, Siva, Courville, Aaron, Roux, Nicolas Le
Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receiving any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal
Externí odkaz:
http://arxiv.org/abs/2410.01679
The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While soft mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reaso
Externí odkaz:
http://arxiv.org/abs/2410.01930
In the coming years, AI agents will be used for making more complex decisions, including in situations involving many different groups of people. One big challenge is that AI agent tends to act in its own interest, unlike humans who often think about
Externí odkaz:
http://arxiv.org/abs/2409.02960
Autor:
Nguyen, Bac, Uhlich, Stefan, Cardinaux, Fabien, Mauch, Lukas, Edraki, Marzieh, Courville, Aaron
Handling distribution shifts from training data, known as out-of-distribution (OOD) generalization, poses a significant challenge in the field of machine learning. While a pre-trained vision-language model like CLIP has demonstrated remarkable zero-s
Externí odkaz:
http://arxiv.org/abs/2407.03036
Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements bu
Externí odkaz:
http://arxiv.org/abs/2406.17523
Learning generalist embodied agents, able to solve multitudes of tasks in different domains is a long-standing problem. Reinforcement learning (RL) is hard to scale up as it requires a complex reward design for each task. In contrast, language can sp
Externí odkaz:
http://arxiv.org/abs/2406.18043