Zobrazeno 1 - 10
of 55
pro vyhledávání: '"Vyas, Nikhil"'
Autor:
Vyas, Nikhil, Morwani, Depen, Zhao, Rosie, Shapira, Itai, Brandfonbrener, David, Janson, Lucas, Kakade, Sham
There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared
Externí odkaz:
http://arxiv.org/abs/2409.11321
We construct 2-query, quasi-linear sized probabilistically checkable proofs (PCPs) with arbitrarily small constant soundness, improving upon Dinur's 2-query quasi-linear size PCPs with soundness $1-\Omega(1)$. As an immediate corollary, we get that u
Externí odkaz:
http://arxiv.org/abs/2407.12762
Training language models becomes increasingly expensive with scale, prompting numerous attempts to improve optimization efficiency. Despite these efforts, the Adam optimizer remains the most widely used, due to a prevailing view that it is the most e
Externí odkaz:
http://arxiv.org/abs/2407.07972
Shampoo, a second-order optimization algorithm which uses a Kronecker product preconditioner, has recently garnered increasing attention from the machine learning community. The preconditioner used by Shampoo can be viewed either as an approximation
Externí odkaz:
http://arxiv.org/abs/2406.17748
We study the feasibility of identifying epistemic uncertainty (reflecting a lack of knowledge), as opposed to aleatoric uncertainty (reflecting entropy in the underlying distribution), in the outputs of large language models (LLMs) over free-form tex
Externí odkaz:
http://arxiv.org/abs/2402.03563
Autor:
Vyas, Nikhil
In this thesis we study satisfiability algorithms and connections between algorithms and circuit lower bounds. We give new results in the following three areas: Oracles and Algorithmic Methods for Proving Lower Bounds: We give an equivalence between
In this study, we investigate whether the representations learned by neural networks possess a privileged and convergent basis. Specifically, we examine the significance of feature directions represented by individual neurons. First, we establish tha
Externí odkaz:
http://arxiv.org/abs/2307.12941
The success of SGD in deep learning has been ascribed by prior works to the implicit bias induced by finite batch sizes ("SGD noise"). While prior works focused on offline learning (i.e., multiple-epoch training), we study the impact of SGD noise on
Externí odkaz:
http://arxiv.org/abs/2306.08590
Autor:
Vyas, Nikhil, Atanasov, Alexander, Bordelon, Blake, Morwani, Depen, Sainathan, Sabarish, Pehlevan, Cengiz
We study the effect of width on the dynamics of feature-learning neural networks across a variety of architectures and datasets. Early in training, wide neural networks trained on online data have not only identical loss curves but also agree in thei
Externí odkaz:
http://arxiv.org/abs/2305.18411
There is a growing concern that learned conditional generative models may output samples that are substantially similar to some copyrighted data $C$ that was in their training set. We give a formal definition of $\textit{near access-freeness (NAF)}$
Externí odkaz:
http://arxiv.org/abs/2302.10870