Výsledky vyhledávání - "Vyas, Nikhil"

Report

SOAP: Improving and Stabilizing Shampoo using Adam

Autor: Vyas, Nikhil, Morwani, Depen, Zhao, Rosie, Shapira, Itai, Brandfonbrener, David, Janson, Lucas, Kakade, Sham

There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared

Externí odkaz: http://arxiv.org/abs/2409.11321

Zobrazit plný text záznamu

Report

Quasi-Linear Size PCPs with Small Soundness from HDX

Autor: Bafna, Mitali, Minzer, Dor, Vyas, Nikhil

We construct 2-query, quasi-linear sized probabilistically checkable proofs (PCPs) with arbitrarily small constant soundness, improving upon Dinur's 2-query quasi-linear size PCPs with soundness $1-\Omega(1)$. As an immediate corollary, we get that u

Externí odkaz: http://arxiv.org/abs/2407.12762

Zobrazit plný text záznamu

Report

Deconstructing What Makes a Good Optimizer for Language Models

Autor: Zhao, Rosie, Morwani, Depen, Brandfonbrener, David, Vyas, Nikhil, Kakade, Sham

Training language models becomes increasingly expensive with scale, prompting numerous attempts to improve optimization efficiency. Despite these efforts, the Adam optimizer remains the most widely used, due to a prevailing view that it is the most e

Externí odkaz: http://arxiv.org/abs/2407.07972

Zobrazit plný text záznamu

Report

A New Perspective on Shampoo's Preconditioner

Autor: Morwani, Depen, Shapira, Itai, Vyas, Nikhil, Malach, Eran, Kakade, Sham, Janson, Lucas

Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community. The preconditioner used by Shampoo can be viewed either as an approximation

Externí odkaz: http://arxiv.org/abs/2406.17748

Zobrazit plný text záznamu

Report

Distinguishing the Knowable from the Unknowable with Language Models

Autor: Ahdritz, Gustaf, Qin, Tian, Vyas, Nikhil, Barak, Boaz, Edelman, Benjamin L.

We study the feasibility of identifying epistemic uncertainty (reflecting a lack of knowledge), as opposed to aleatoric uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form tex

Externí odkaz: http://arxiv.org/abs/2402.03563

Zobrazit plný text záznamu

Dissertation/ Thesis

Satisfiability Algorithms and Connections between Algorithms and Circuit Lower Bounds

Autor: Vyas, Nikhil

In this thesis we study satisfiability algorithms and connections between algorithms and circuit lower bounds. We give new results in the following three areas: Oracles and Algorithmic Methods for Proving Lower Bounds: We give an equivalence between

Externí odkaz: https://hdl.handle.net/1721.1/150213
https://orcid.org/0000-0002-4055-7693

Zobrazit plný text záznamu

Report

On Privileged and Convergent Bases in Neural Network Representations

Autor: Brown, Davis, Vyas, Nikhil, Bansal, Yamini

In this study, we investigate whether the representations learned by neural networks possess a privileged and convergent basis. Specifically, we examine the significance of feature directions represented by individual neurons. First, we establish tha

Externí odkaz: http://arxiv.org/abs/2307.12941

Zobrazit plný text záznamu

Report

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Autor: Vyas, Nikhil, Morwani, Depen, Zhao, Rosie, Kaplun, Gal, Kakade, Sham, Barak, Boaz

The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by finite batch sizes ("SGD noise"). While prior works focused on offline learning (i.e., multiple-epoch training), we study the impact of SGD noise on

Externí odkaz: http://arxiv.org/abs/2306.08590

Zobrazit plný text záznamu

Report

Feature-Learning Networks Are Consistent Across Widths At Realistic Scales

Autor: Vyas, Nikhil, Atanasov, Alexander, Bordelon, Blake, Morwani, Depen, Sainathan, Sabarish, Pehlevan, Cengiz

We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in thei

Externí odkaz: http://arxiv.org/abs/2305.18411

Zobrazit plný text záznamu

Report

On Provable Copyright Protection for Generative Models

Autor: Vyas, Nikhil, Kakade, Sham, Barak, Boaz

There is a growing concern that learned conditional generative models may output samples that are substantially similar to some copyrighted data $C$ that was in their training set. We give a formal definition of $\textit{near access-freeness (NAF)}$

Externí odkaz: http://arxiv.org/abs/2302.10870

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání