Zobrazeno 1 - 10
of 2 206
pro vyhledávání: '"Ablin, A"'
Autor:
Kirchhof, Michael, Thornton, James, Ablin, Pierre, Béthune, Louis, Ndiaye, Eugene, Cuturi, Marco
The increased adoption of diffusion models in text-to-image generation has triggered concerns on their reliability. Such models are now closely scrutinized under the lens of various metrics, notably calibration, fairness, or compute efficiency. We fo
Externí odkaz:
http://arxiv.org/abs/2410.06025
The composition of training data mixtures is critical for effectively training large language models (LLMs), as it directly impacts their performance on downstream tasks. Our goal is to identify an optimal data mixture to specialize an LLM for a spec
Externí odkaz:
http://arxiv.org/abs/2410.02498
Specialist language models (LMs) focus on a specific task or domain on which they often outperform generalist LMs of the same size. However, the specialist data needed to pretrain these models is only available in limited amount for most tasks. In th
Externí odkaz:
http://arxiv.org/abs/2410.03735
Autor:
Ramapuram, Jason, Danieli, Federico, Dhekane, Eeshan, Weers, Floris, Busbridge, Dan, Ablin, Pierre, Likhomanenko, Tatiana, Digani, Jagrit, Gu, Zijin, Shidani, Amitis, Webb, Russ
Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as the softmax of dot products between keys and quer
Externí odkaz:
http://arxiv.org/abs/2409.04431
Momentum based optimizers are central to a wide range of machine learning applications. These typically rely on an Exponential Moving Average (EMA) of gradients, which decays exponentially the present contribution of older gradients. This accounts fo
Externí odkaz:
http://arxiv.org/abs/2409.03137
Optimization over the set of matrices $X$ that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independe
Externí odkaz:
http://arxiv.org/abs/2405.01702
Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem. It is routinely used in Machine Learning, notably for hyperparameter tuning. The conventional method to compute the so-ca
Externí odkaz:
http://arxiv.org/abs/2402.16748
Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e.g. performance on another dataset, robustness, agreement with a prior). Al
Externí odkaz:
http://arxiv.org/abs/2402.02998
Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference, but their lower capacity means that their performance can be good only if one limits their scope to a specialized
Externí odkaz:
http://arxiv.org/abs/2402.01093
Self-attention and masked self-attention are at the heart of Transformers' outstanding success. Still, our mathematical understanding of attention, in particular of its Lipschitz properties - which are key when it comes to analyzing robustness and ex
Externí odkaz:
http://arxiv.org/abs/2312.14820