Výsledky vyhledávání

Report

Conduction band resonant states absorption for quantum dot infrared detectors operating at room temperature

Autor: Vichi, Stefano, Asahi, Shigeo, Bietti, Sergio, Tuktamyshev, Artur, Fedorov, Alexey, Kita, Takashi, Sanguinetti, Stefano

Long Wavelenght infrared devices, despite growing interest due to a wide range of applications in commercial, public, and academic sectors, are still struggling to achieve significant improvements over well-established technologies like HgCdTe detect

Externí odkaz: http://arxiv.org/abs/2407.18302

Zobrazit plný text záznamu

Report

How Truncating Weights Improves Reasoning in Language Models

Autor: Chen, Lei, Bruna, Joan, Bietti, Alberto

In addition to the ability to generate fluent text in various languages, large language models have been successful at tasks that involve basic forms of logical "reasoning" over their context. Recent work found that selectively removing certain compo

Externí odkaz: http://arxiv.org/abs/2406.03068

Zobrazit plný text záznamu

Report

Contextual Counting: A Mechanistic Study of Transformers on a Quantitative Task

Autor: Golkar, Siavash, Bietti, Alberto, Pettee, Mariel, Eickenberg, Michael, Cranmer, Miles, Hirashima, Keiya, Krawezik, Geraud, Lourie, Nicholas, McCabe, Michael, Morel, Rudy, Ohana, Ruben, Parker, Liam Holden, Blancard, Bruno Régaldo-Saint, Cho, Kyunghyun, Ho, Shirley

Transformers have revolutionized machine learning across diverse domains, yet understanding their behavior remains crucial, particularly in high-stakes applications. This paper introduces the contextual counting task, a novel toy problem aimed at enh

Externí odkaz: http://arxiv.org/abs/2406.02585

Zobrazit plný text záznamu

Report

Level Set Teleportation: An Optimization Perspective

Autor: Mishkin, Aaron, Bietti, Alberto, Gower, Robert M.

We study level set teleportation, an optimization sub-routine which seeks to accelerate gradient methods by maximizing the gradient norm on a level-set of the objective function. Since the descent lemma implies that gradient descent (GD) decreases th

Externí odkaz: http://arxiv.org/abs/2403.03362

Zobrazit plný text záznamu

Report

Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Autor: Kunstner, Frederik, Yadav, Robin, Milligan, Alan, Schmidt, Mark, Bietti, Alberto

Adam has been shown to outperform gradient descent on large language models by a larger margin than on other tasks, but it is unclear why. We show that a key factor in this performance gap is the heavy-tailed class imbalance found in language tasks.

Externí odkaz: http://arxiv.org/abs/2402.19449

Zobrazit plný text záznamu

Report

Learning Associative Memories with Gradient Descent

Autor: Cabannes, Vivien, Simsek, Berfin, Bietti, Alberto

This work focuses on the training dynamics of one associative memory module storing outer products of token embeddings. We reduce this problem to the study of a system of particles, which interact according to properties of the data distribution and

Externí odkaz: http://arxiv.org/abs/2402.18724

Zobrazit plný text záznamu

Report

On Learning Gaussian Multi-index Models with Gradient Flow

Autor: Bietti, Alberto, Bruna, Joan, Pillaud-Vivien, Loucas

We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such,

Externí odkaz: http://arxiv.org/abs/2310.19793

Zobrazit plný text záznamu

Report

AstroCLIP: A Cross-Modal Foundation Model for Galaxies

Autor: Parker, Liam, Lanusse, Francois, Golkar, Siavash, Sarra, Leopoldo, Cranmer, Miles, Bietti, Alberto, Eickenberg, Michael, Krawezik, Geraud, McCabe, Michael, Ohana, Ruben, Pettee, Mariel, Blancard, Bruno Regaldo-Saint, Tesileanu, Tiberiu, Cho, Kyunghyun, Ho, Shirley

We present AstroCLIP, a single, versatile model that can embed both galaxy images and spectra into a shared, physically meaningful latent space. These embeddings can then be used - without any model fine-tuning - for a variety of downstream tasks inc

Externí odkaz: http://arxiv.org/abs/2310.03024

Zobrazit plný text záznamu

Report

Multiple Physics Pretraining for Physical Surrogate Models

Autor: McCabe, Michael, Blancard, Bruno Régaldo-Saint, Parker, Liam Holden, Ohana, Ruben, Cranmer, Miles, Bietti, Alberto, Eickenberg, Michael, Golkar, Siavash, Krawezik, Geraud, Lanusse, Francois, Pettee, Mariel, Tesileanu, Tiberiu, Cho, Kyunghyun, Ho, Shirley

We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling. MPP involves training large surrogate models to predict the dynamics of multiple heterogeneous physical systems sim

Externí odkaz: http://arxiv.org/abs/2310.02994

Zobrazit plný text záznamu

Report

xVal: A Continuous Number Encoding for Large Language Models

Autor: Golkar, Siavash, Pettee, Mariel, Eickenberg, Michael, Bietti, Alberto, Cranmer, Miles, Krawezik, Geraud, Lanusse, Francois, McCabe, Michael, Ohana, Ruben, Parker, Liam, Blancard, Bruno Régaldo-Saint, Tesileanu, Tiberiu, Cho, Kyunghyun, Ho, Shirley

Large Language Models have not yet been broadly adapted for the analysis of scientific datasets due in part to the unique difficulties of tokenizing numbers. We propose xVal, a numerical encoding scheme that represents any real number using just a si

Externí odkaz: http://arxiv.org/abs/2310.02989

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání