Zobrazeno 1 - 10
of 1 735
pro vyhledávání: '"A. Soudry"'
Implicit bias describes the phenomenon where optimization-based training algorithms, without explicit regularization, show a preference for simple estimators even when more complex estimators have equal objective values. Multiple works have developed
Externí odkaz:
http://arxiv.org/abs/2411.01350
Autor:
Ginzburg, David, Soudry, David
We determine the poles of the Eisenstein series on a general linear group, induced from two Speh representations, $\Delta(\tau,m_1)|\cdot|^s\times\Delta(\tau,m_2)|\cdot|^{-s}$, $Re(s)\geq 0$, where $\tau$ is an irreducible, unitary, cuspidal, automor
Externí odkaz:
http://arxiv.org/abs/2410.23026
We study the overfitting behavior of fully connected deep Neural Networks (NNs) with binary weights fitted to perfectly classify a noisy training set. We consider interpolation using both the smallest NN (having the minimal number of weights) and a r
Externí odkaz:
http://arxiv.org/abs/2410.19092
Many recent methods aim to merge neural networks (NNs) with identical architectures trained on different tasks to obtain a single multi-task model. Most existing works tackle the simpler setup of merging NNs initialized from a common pre-trained netw
Externí odkaz:
http://arxiv.org/abs/2410.01483
We train, for the first time, large language models using FP8 precision on datasets up to 2 trillion tokens -- a 20-fold increase over previous limits. Through these extended training runs, we uncover critical instabilities in FP8 training that were
Externí odkaz:
http://arxiv.org/abs/2409.12517
We study the generalization of two-layer ReLU neural networks in a univariate nonparametric regression problem with noisy labels. This is a problem where kernels (\emph{e.g.} NTK) are provably sub-optimal and benign overfitting does not happen, thus
Externí odkaz:
http://arxiv.org/abs/2406.06838
Autor:
Buzaglo, Gon, Harel, Itamar, Nacson, Mor Shpigel, Brutzkus, Alon, Srebro, Nathan, Soudry, Daniel
Background. A main theoretical puzzle is why over-parameterized Neural Networks (NNs) generalize well when trained to zero loss (i.e., so they interpolate the data). Usually, the NN is trained with Stochastic Gradient Descent (SGD) or one of its vari
Externí odkaz:
http://arxiv.org/abs/2402.06323
The majority of the research on the quantization of Deep Neural Networks (DNNs) is focused on reducing the precision of tensors visible by high-level frameworks (e.g., weights, activations, and gradients). However, current hardware still relies on hi
Externí odkaz:
http://arxiv.org/abs/2401.14110
In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks. Previous works have analyzed separately how forgetting is affected by either task similarity or overparameterization. In contrast, our paper examines how tas
Externí odkaz:
http://arxiv.org/abs/2401.12617
Neural network (NN) denoisers are an essential building block in many common tasks, ranging from image reconstruction to image generation. However, the success of these models is not well understood from a theoretical perspective. In this paper, we a
Externí odkaz:
http://arxiv.org/abs/2311.06748