Výsledky vyhledávání - "Bernstein, Jeremy"

Report

Autor: Bernstein, Jeremy, Newhouse, Laker

An old idea in optimization theory says that since the gradient is a dual vector it may not be subtracted from the weights without first being mapped to the primal space where the weights reside. We take this idea seriously in this paper and construc

Externí odkaz: http://arxiv.org/abs/2410.21265

Zobrazit plný text záznamu

Report

Old Optimizer, New Norm: An Anthology

Autor: Bernstein, Jeremy, Newhouse, Laker

Deep learning optimizers are often motivated through a mix of convex and approximate second-order theory. We select three such methods -- Adam, Shampoo and Prodigy -- and argue that each method can instead be understood as a squarely first-order meth

Externí odkaz: http://arxiv.org/abs/2409.20325

Zobrazit plný text záznamu

Report

Scalable Optimization in the Modular Norm

Autor: Large, Tim, Liu, Yang, Huh, Minyoung, Bahng, Hyojin, Isola, Phillip, Bernstein, Jeremy

To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ramping up the width of a single layer, graceful scaling of training has been linked

Externí odkaz: http://arxiv.org/abs/2405.14813

Zobrazit plný text záznamu

Report

Training Neural Networks from Scratch with Parallel Low-Rank Adapters

Autor: Huh, Minyoung, Cheung, Brian, Bernstein, Jeremy, Isola, Phillip, Agrawal, Pulkit

The scalability of deep learning models is fundamentally limited by computing resources, memory, and communication. Although methods like low-rank adaptation (LoRA) have reduced the cost of model finetuning, its application in model pre-training rema

Externí odkaz: http://arxiv.org/abs/2402.16828

Zobrazit plný text záznamu

Report

A Spectral Condition for Feature Learning

Autor: Yang, Greg, Simon, James B., Bernstein, Jeremy

The push to train ever larger neural networks has motivated the study of initialization and training at large network width. A key challenge is to scale training so that a network's internal representations evolve nontrivially at all widths, a proces

Externí odkaz: http://arxiv.org/abs/2310.17813

Zobrazit plný text záznamu

Kniha

Quantum Profiles : Second Edition. [elektronicky zdroj]

Autor: Bernstein, Jeremy

Externí odkaz: Kolekce e-knih KNAV Registrovani uzivatele: plny text 5 minut, dalsi pristup na vyzadani. Registered users: full text online 5 minutes, further access on request.

Dissertation/ Thesis

Optimisation & Generalisation in Networks of Neurons

Autor: Bernstein, Jeremy David

The goal of this thesis is to develop the optimisation and generalisation theoretic foundations of learning in artificial neural networks. The thesis tackles two central questions. Given training data and a network architecture: Which weight setting

Externí odkaz: https://thesis.library.caltech.edu/15041/1/bernstein_thesis.pdf

Zobrazit plný text záznamu

Report

Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models

Autor: Heiding, Fredrik, Schneier, Bruce, Vishwanath, Arun, Bernstein, Jeremy, Park, Peter S.

AI programs, built using large language models, make it possible to automatically create phishing emails based on a few data points about a user. They stand in contrast to traditional phishing emails that hackers manually design using general rules g

Externí odkaz: http://arxiv.org/abs/2308.12287

Zobrazit plný text záznamu

Report

SketchOGD: Memory-Efficient Continual Learning

Autor: Wright, Benjamin, Min, Youngjae, Bernstein, Jeremy, Azizan, Navid

When machine learning models are trained continually on a sequence of tasks, they are liable to forget what they learned on previous tasks -- a phenomenon known as catastrophic forgetting. Proposed solutions to catastrophic forgetting tend to involve

Externí odkaz: http://arxiv.org/abs/2305.16424

Zobrazit plný text záznamu

Report

Automatic Gradient Descent: Deep Learning without Hyperparameters

Autor: Bernstein, Jeremy, Mingard, Chris, Huang, Kevin, Azizan, Navid, Yue, Yisong

The architecture of a deep neural network is defined explicitly in terms of the number of layers, the width of each layer and the general network topology. Existing optimisation frameworks neglect this information in favour of implicit architectural

Externí odkaz: http://arxiv.org/abs/2304.05187

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání