Výsledky vyhledávání

Report

Understanding Factual Recall in Transformers via Associative Memories

Autor: Nichani, Eshaan, Lee, Jason D., Bietti, Alberto

Large language models have demonstrated an impressive ability to perform factual recall. Prior work has found that transformers trained on factual recall tasks can store information at a rate proportional to their parameter count. In our work, we sho

Externí odkaz: http://arxiv.org/abs/2412.06538

Zobrazit plný text záznamu

Report

Learning Hierarchical Polynomials of Multiple Nonlinear Features with Three-Layer Networks

Autor: Fu, Hengyu, Wang, Zihao, Nichani, Eshaan, Lee, Jason D.

In deep learning theory, a critical question is to understand how neural networks learn hierarchical features. In this work, we study the learning of hierarchical polynomials of \textit{multiple nonlinear features} using three-layer neural networks.

Externí odkaz: http://arxiv.org/abs/2411.17201

Zobrazit plný text záznamu

Report

How Transformers Learn Causal Structure with Gradient Descent

Autor: Nichani, Eshaan, Damian, Alex, Lee, Jason D.

The incredible success of transformers on sequence modeling tasks can be largely attributed to the self-attention mechanism, which allows information to be transferred between different parts of a sequence. Self-attention allows transformers to encod

Externí odkaz: http://arxiv.org/abs/2402.14735

Zobrazit plný text záznamu

Report

Learning Hierarchical Polynomials with Three-Layer Neural Networks

Autor: Wang, Zihao, Nichani, Eshaan, Lee, Jason D.

We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the form $h = g \circ p$ where $p : \mathbb{R}^d \rightarrow \mathbb{R}$

Externí odkaz: http://arxiv.org/abs/2311.13774

Zobrazit plný text záznamu

Report

Fine-Tuning Language Models with Just Forward Passes

Autor: Malladi, Sadhika, Gao, Tianyu, Nichani, Eshaan, Damian, Alex, Lee, Jason D., Chen, Danqi, Arora, Sanjeev

Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients using only two

Externí odkaz: http://arxiv.org/abs/2305.17333

Zobrazit plný text záznamu

Report

Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models

Autor: Damian, Alex, Nichani, Eshaan, Ge, Rong, Lee, Jason D.

We focus on the task of learning a single index model $\sigma(w^\star \cdot x)$ with respect to the isotropic Gaussian distribution in $d$ dimensions. Prior work has shown that the sample complexity of learning $w^\star$ is governed by the informatio

Externí odkaz: http://arxiv.org/abs/2305.10633

Zobrazit plný text záznamu

Report

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Autor: Nichani, Eshaan, Damian, Alex, Lee, Jason D.

One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the

Externí odkaz: http://arxiv.org/abs/2305.06986

Zobrazit plný text záznamu

Akademický článek

Exploring the burden of paediatric acute otitis media with discharge in the UK: a qualitative study

Autor: Rachel Isba, Darren M Ashcroft, Alastair D Hay, Elliot Heward, Iain A Bruce, Judith Lunn, John Molloy, Jaya R Nichani, James Birkenshaw-Dempsey

Publikováno v: BMJ Paediatrics Open, Vol 8, Iss 1 (2024)

Background Acute otitis media with discharge (AOMd) results from a tympanic membrane perforation secondary to a middle ear infection. Currently, the impact of AOMd on children and young people (CYP) and their families is not well understood. There is

Externí odkaz: https://doaj.org/article/ab4dada39532471385eca54019b88d45

Zobrazit plný text záznamu

Akademický článek

Indian Society of Periodontology Good Clinical Practice Recommendations for Peri-implant Care

Publikováno v: Journal of Indian Society of Periodontology, Vol 28, Iss 1, Pp 6-31 (2024)

Current implant therapy is a frequently employed treatment for individuals who have lost teeth, as it offers functional and biological advantages over old prostheses. Concurrently, active exploration of intervention strategies aims to prevent the pro

Externí odkaz: https://doaj.org/article/2f0e4b4616304980b18ca4375c6c8c0e

Zobrazit plný text záznamu

Report

Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability

Autor: Damian, Alex, Nichani, Eshaan, Lee, Jason D.

Traditional analyses of gradient descent show that when the largest eigenvalue of the Hessian, also known as the sharpness $S(\theta)$, is bounded by $2/\eta$, training is "stable" and the training loss decreases monotonically. Recent works, however,

Externí odkaz: http://arxiv.org/abs/2209.15594

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání