Zobrazeno 1 - 10
of 328
pro vyhledávání: '"Long Philip M"'
Autor:
Long, Philip M., Bartlett, Peter L.
Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around
Externí odkaz:
http://arxiv.org/abs/2309.12488
Autor:
Bartlett, Peter L., Long, Philip M.
We present a new general-purpose algorithm for learning classes of $[0,1]$-valued functions in a generalization of the prediction model, and prove a general upper bound on the expected absolute error of this algorithm in terms of a scale-sensitive ge
Externí odkaz:
http://arxiv.org/abs/2304.11059
We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic obj
Externí odkaz:
http://arxiv.org/abs/2210.01513
Autor:
Chatterji, Niladri S., Long, Philip M.
We bound the excess risk of interpolating deep linear networks trained using gradient flow. In a setting previously used to establish risk bounds for the minimum $\ell_2$-norm interpolant, we show that randomly initialized deep linear networks can cl
Externí odkaz:
http://arxiv.org/abs/2209.09315
Autor:
Long, Philip M., Servedio, Rocco A.
Van Rooyen et al. introduced a notion of convex loss functions being robust to random classification noise, and established that the "unhinged" loss function is robust in this sense. In this note we study the accuracy of binary classifiers obtained b
Externí odkaz:
http://arxiv.org/abs/2112.04590
Autor:
Chatterji, Niladri S., Long, Philip M.
We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime. We apply this result to obtain a lower bound for basis pursuit (the minimum $\ell_1$-norm interpola
Externí odkaz:
http://arxiv.org/abs/2110.02914
The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign over
Externí odkaz:
http://arxiv.org/abs/2108.11489
Autor:
Long, Philip M.
The Neural Tangent Kernel (NTK) is the wide-network limit of a kernel defined using neural networks at initialization, whose embedding is the gradient of the output of the network with respect to its parameters. We study the "after kernel", which is
Externí odkaz:
http://arxiv.org/abs/2105.10585
We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU, such as Swish and t
Externí odkaz:
http://arxiv.org/abs/2102.04998
We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss. We show that gradient descent drives the training loss to zero if the initial loss is small enough. When the data satisfies cert
Externí odkaz:
http://arxiv.org/abs/2012.02409