Zobrazeno 1 - 10
of 500
pro vyhledávání: '"Peyré, Gabriel"'
Autor:
Sander, Michael E., Peyré, Gabriel
Causal Transformers are trained to predict the next token for a given context. While it is widely accepted that self-attention is crucial for encoding the causal structure of sequences, the precise underlying mechanism behind this in-context autoregr
Externí odkaz:
http://arxiv.org/abs/2410.03011
Transformers are deep architectures that define "in-context mappings" which enable predicting new tokens based on a given set of tokens (such as a prompt in NLP applications or a set of patches for a vision transformer). In this work, we study in par
Externí odkaz:
http://arxiv.org/abs/2408.01367
Conservation laws are well-established in the context of Euclidean gradient flow dynamics, notably for linear or ReLU neural network training. Yet, their existence and principles for non-Euclidean geometries and momentum-based dynamics remain largely
Externí odkaz:
http://arxiv.org/abs/2405.12888
We study the convergence of gradient flow for the training of deep neural networks. If Residual Neural Networks are a popular example of very deep architectures, their training constitutes a challenging optimization problem due notably to the non-con
Externí odkaz:
http://arxiv.org/abs/2403.12887
Bilevel optimization aims to optimize an outer objective function that depends on the solution to an inner optimization problem. It is routinely used in Machine Learning, notably for hyperparameter tuning. The conventional method to compute the so-ca
Externí odkaz:
http://arxiv.org/abs/2402.16748
Transformers have achieved state-of-the-art performance in language modeling tasks. However, the reasons behind their tremendous success are still unclear. In this paper, towards a better understanding, we train a Transformer model on a simple next t
Externí odkaz:
http://arxiv.org/abs/2402.05787
Self-attention and masked self-attention are at the heart of Transformers' outstanding success. Still, our mathematical understanding of attention, in particular of its Lipschitz properties - which are key when it comes to analyzing robustness and ex
Externí odkaz:
http://arxiv.org/abs/2312.14820
Autor:
Poon, Clarice, Peyré, Gabriel
Super-resolution of pointwise sources is of utmost importance in various areas of imaging sciences. Specific instances of this problem arise in single molecule fluorescence, spike sorting in neuroscience, astrophysical imaging, radar imaging, and nuc
Externí odkaz:
http://arxiv.org/abs/2311.09928
Matching a source to a target probability measure is often solved by instantiating a linear optimal transport (OT) problem, parameterized by a ground cost function that quantifies discrepancy between points. When these measures live in the same metri
Externí odkaz:
http://arxiv.org/abs/2311.05788
Optimal Transport is a useful metric to compare probability distributions and to compute a pairing given a ground cost. Its entropic regularization variant (eOT) is crucial to have fast algorithms and reflect fuzzy/noisy matchings. This work focuses
Externí odkaz:
http://arxiv.org/abs/2310.05461