Zobrazeno 1 - 10
of 1 602
pro vyhledávání: '"A. Jelassi"'
In this work, we explore the limitations of combining models by averaging intermediate features, referred to as model merging, and propose a new direction for achieving collective model intelligence through what we call compatible specialization. Cur
Externí odkaz:
http://arxiv.org/abs/2411.02207
Autor:
Jelassi, Samy, Mohri, Clara, Brandfonbrener, David, Gu, Alex, Vyas, Nikhil, Anand, Nikhil, Alvarez-Melis, David, Li, Yuanzhi, Kakade, Sham M., Malach, Eran
The Mixture-of-Experts (MoE) architecture enables a significant increase in the total number of model parameters with minimal computational overhead. However, it is not clear what performance tradeoffs, if any, exist between MoEs and standard dense t
Externí odkaz:
http://arxiv.org/abs/2410.19034
Autor:
Prabhakar, Akshara, Li, Yuanzhi, Narasimhan, Karthik, Kakade, Sham, Malach, Eran, Jelassi, Samy
Low-Rank Adaptation (LoRA) is a popular technique for parameter-efficient fine-tuning of Large Language Models (LLMs). We study how different LoRA modules can be merged to achieve skill composition -- testing the performance of the merged model on a
Externí odkaz:
http://arxiv.org/abs/2410.13025
Autor:
Kaada, Soumeya, Tran, Dinh-Hieu, Van Huynh, Nguyen, Morel, Marie-Line Alberi, Jelassi, Sofiene, Rubino, Gerardo
Resilience is defined as the ability of a network to resist, adapt, and quickly recover from disruptions, and to continue to maintain an acceptable level of services from users' perspective. With the advent of future radio networks, including advance
Externí odkaz:
http://arxiv.org/abs/2407.18066
Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models. While prior work has proposed some architecture or data format changes to achieve le
Externí odkaz:
http://arxiv.org/abs/2407.03310
Overparameterization, the condition where models have more parameters than necessary to fit their training loss, is a crucial factor for the success of deep learning. However, the characteristics of the features learned by overparameterized networks
Externí odkaz:
http://arxiv.org/abs/2407.00968
We analyze an optimization problem of the conductivity in a composite material arising in a heat conduction energy storage problem. The model is described by the heat equation that specifies the heat exchange between two types of materials with diffe
Externí odkaz:
http://arxiv.org/abs/2403.20181
Autor:
Li, Kenneth, Jelassi, Samy, Zhang, Hugh, Kakade, Sham, Wattenberg, Martin, Brandfonbrener, David
We present an approach called Q-probing to adapt a pre-trained language model to maximize a task-specific reward function. At a high level, Q-probing sits between heavier approaches such as finetuning and lighter approaches such as few shot prompting
Externí odkaz:
http://arxiv.org/abs/2402.14688
Transformers are the dominant architecture for sequence modeling, but there is growing interest in models that use a fixed-size latent state that does not depend on the sequence length, which we refer to as "generalized state space models" (GSSMs). I
Externí odkaz:
http://arxiv.org/abs/2402.01032
Autor:
Rim Rejaibi, Arnaud Guille, Maroua Manai, Jose Adelaide, Emilie Agavnian, Aida Jelassi, Raoudha Doghri, Emmanuelle Charafe-Jauffret, François Bertucci, Mohamed Manai, Karima Mrad, Lamia Charfi, Renaud Sabatier
Publikováno v:
Scientific Reports, Vol 14, Iss 1, Pp 1-12 (2024)
Abstract Ovarian cancer (OC) is one of the most common cancers in women, with a high mortality rate. Most of published studies have been focused on Caucasian populations, with the need to explore biological features and clinical outcomes of patients
Externí odkaz:
https://doaj.org/article/30285a921fbd4f2f8e43df5abf2491dd