Zobrazeno 1 - 10
of 312
pro vyhledávání: '"Roux, Nicolás"'
Humans have the ability to learn new tasks by inferring high-level concepts from existing solution, then manipulating these concepts in lieu of the raw data. Can we automate this process by deriving latent semantic structures in a document collection
Externí odkaz:
http://arxiv.org/abs/2410.05481
Autor:
Kazemnejad, Amirhossein, Aghajohari, Milad, Portelance, Eva, Sordoni, Alessandro, Reddy, Siva, Courville, Aaron, Roux, Nicolas Le
Large language models (LLMs) are increasingly applied to complex reasoning tasks that require executing several complex steps before receiving any reward. Properly assigning credit to these steps is essential for enhancing model performance. Proximal
Externí odkaz:
http://arxiv.org/abs/2410.01679
While finetuning language models from pairwise preferences has proven remarkably effective, the underspecified nature of natural language presents critical challenges. Direct preference feedback is uninterpretable, difficult to provide where multidim
Externí odkaz:
http://arxiv.org/abs/2407.14916
Autor:
Ostapenko, Oleksiy, Su, Zhan, Ponti, Edoardo Maria, Charlin, Laurent, Roux, Nicolas Le, Pereira, Matheus, Caccia, Lucas, Sordoni, Alessandro
The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks. We study how to best build a library of adapters given mult
Externí odkaz:
http://arxiv.org/abs/2405.11157
Autor:
Fu, Haotian, Sharma, Pratyusha, Stengel-Eskin, Elias, Konidaris, George, Roux, Nicolas Le, Côté, Marc-Alexandre, Yuan, Xingdi
We present an algorithm for skill discovery from expert demonstrations. The algorithm first utilizes Large Language Models (LLMs) to propose an initial segmentation of the trajectories. Following that, a hierarchical variational inference framework i
Externí odkaz:
http://arxiv.org/abs/2402.16354
Autor:
Sordoni, Alessandro, Yuan, Xingdi, Côté, Marc-Alexandre, Pereira, Matheus, Trischler, Adam, Xiao, Ziang, Hosseini, Arian, Niedtner, Friederike, Roux, Nicolas Le
Large language models (LLMs) can be seen as atomic units of computation mapping sequences to a distribution over sequences. Thus, they can be seen as stochastic language layers in a language network, where the learnable parameters are the natural lan
Externí odkaz:
http://arxiv.org/abs/2306.12509
The growing utilization of machine learning (ML) in decision-making processes raises questions about its benefits to society. In this study, we identify and analyze three axes of heterogeneity that significantly influence the trajectory of ML product
Externí odkaz:
http://arxiv.org/abs/2306.10043
Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the TD error,
Externí odkaz:
http://arxiv.org/abs/2305.15249
Autor:
Lavington, Jonathan Wilder, Vaswani, Sharan, Babanezhad, Reza, Schmidt, Mark, Roux, Nicolas Le
We consider minimizing functions for which it is expensive to compute the (possibly stochastic) gradient. Such functions are prevalent in reinforcement learning, imitation learning and adversarial training. Our target optimization framework uses the
Externí odkaz:
http://arxiv.org/abs/2302.02607
Autor:
Caccia, Lucas, Ponti, Edoardo, Su, Zhan, Pereira, Matheus, Roux, Nicolas Le, Sordoni, Alessandro
Parameter-efficient fine-tuning (PEFT) for cross-task generalization consists in pre-training adapters on a multi-task training set before few-shot adaptation to test tasks. Polytropon [Ponti et al., 2023] ($\texttt{Poly}$) jointly learns an inventor
Externí odkaz:
http://arxiv.org/abs/2211.03831