Zobrazeno 1 - 10
of 163
pro vyhledávání: '"Montúfar, Guido"'
Autor:
Liang, Shuang, Montúfar, Guido
We examine the implicit bias of mirror flow in univariate least squares error regression with wide and shallow neural networks. For a broad class of potential functions, we show that mirror flow exhibits lazy training and has the same implicit bias a
Externí odkaz:
http://arxiv.org/abs/2410.03988
Bounds on the smallest eigenvalue of the neural tangent kernel (NTK) are a key ingredient in the analysis of neural network optimization and memorization. However, existing results require distributional assumptions on the data and are limited to a h
Externí odkaz:
http://arxiv.org/abs/2405.14630
Kakade's natural policy gradient method has been studied extensively in the last years showing linear convergence with and without regularization. We study another natural gradient method which is based on the Fisher information matrix of the state-a
Externí odkaz:
http://arxiv.org/abs/2403.19448
We consider a binary classifier defined as the sign of a tropical rational function, that is, as the difference of two convex piecewise linear functions. The parameter space of ReLU neural networks is contained as a semialgebraic set inside the param
Externí odkaz:
http://arxiv.org/abs/2403.11871
The problem of benign overfitting asks whether it is possible for a model to perfectly fit noisy training data and still generalize well. We study benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classificat
Externí odkaz:
http://arxiv.org/abs/2403.06903
Persistent homology (PH) is a method for generating topology-inspired representations of data. Empirical studies that investigate the properties of PH, such as its sensitivity to perturbations or ability to detect a feature of interest, commonly rely
Externí odkaz:
http://arxiv.org/abs/2310.07073
We study the loss landscape of both shallow and deep, mildly overparameterized ReLU neural networks on a generic finite input dataset for the squared error loss. We show both by count and volume that most activation patterns correspond to parameter r
Externí odkaz:
http://arxiv.org/abs/2305.19510
We define the supermodular rank of a function on a lattice. This is the smallest number of terms needed to decompose it into a sum of supermodular functions. The supermodular summands are defined with respect to different partial orders. We character
Externí odkaz:
http://arxiv.org/abs/2305.14632
We study the geometry of linear networks with one-dimensional convolutional layers. The function spaces of these networks can be identified with semi-algebraic families of polynomials admitting sparse factorizations. We analyze the impact of the netw
Externí odkaz:
http://arxiv.org/abs/2304.05752
We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much
Externí odkaz:
http://arxiv.org/abs/2303.03027