Výsledky vyhledávání - "Long Philip M"

Report

Sharpness-Aware Minimization and the Edge of Stability

Autor: Long, Philip M., Bartlett, Peter L.

Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around

Externí odkaz: http://arxiv.org/abs/2309.12488

Zobrazit plný text záznamu

Report

Prediction, Learning, Uniform Convergence, and Scale-sensitive Dimensions

Autor: Bartlett, Peter L., Long, Philip M.

We present a new general-purpose algorithm for learning classes of $[0,1]$-valued functions in a generalization of the prediction model, and prove a general upper bound on the expected absolute error of this algorithm in terms of a scale-sensitive ge

Externí odkaz: http://arxiv.org/abs/2304.11059

Zobrazit plný text záznamu

Report

The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

Autor: Bartlett, Peter L., Long, Philip M., Bousquet, Olivier

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic obj

Externí odkaz: http://arxiv.org/abs/2210.01513

Zobrazit plný text záznamu

Report

Deep Linear Networks can Benignly Overfit when Shallow Ones Do

Autor: Chatterji, Niladri S., Long, Philip M.

We bound the excess risk of interpolating deep linear networks trained using gradient flow. In a setting previously used to establish risk bounds for the minimum $\ell_2$-norm interpolant, we show that randomly initialized deep linear networks can cl

Externí odkaz: http://arxiv.org/abs/2209.09315

Zobrazit plný text záznamu

Report

The perils of being unhinged: On the accuracy of classifiers minimizing a noise-robust convex loss

Autor: Long, Philip M., Servedio, Rocco A.

Van Rooyen et al. introduced a notion of convex loss functions being robust to random classification noise, and established that the "unhinged" loss function is robust in this sense. In this note we study the accuracy of binary classifiers obtained b

Externí odkaz: http://arxiv.org/abs/2112.04590

Zobrazit plný text záznamu

Report

Foolish Crowds Support Benign Overfitting

Autor: Chatterji, Niladri S., Long, Philip M.

We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime. We apply this result to obtain a lower bound for basis pursuit (the minimum $\ell_1$-norm interpola

Externí odkaz: http://arxiv.org/abs/2110.02914

Zobrazit plný text záznamu

Report

The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks

Autor: Chatterji, Niladri S., Long, Philip M., Bartlett, Peter L.

The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign over

Externí odkaz: http://arxiv.org/abs/2108.11489

Zobrazit plný text záznamu

Report

Properties of the After Kernel

Autor: Long, Philip M.

The Neural Tangent Kernel (NTK) is the wide-network limit of a kernel defined using neural networks at initialization, whose embedding is the gradient of the output of the network with respect to its parameters. We study the "after kernel", which is

Externí odkaz: http://arxiv.org/abs/2105.10585

Zobrazit plný text záznamu

Report

When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

Autor: Chatterji, Niladri S., Long, Philip M., Bartlett, Peter L.

We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU, such as Swish and t

Externí odkaz: http://arxiv.org/abs/2102.04998

Zobrazit plný text záznamu

Report

When does gradient descent with logistic loss find interpolating two-layer networks?

Autor: Chatterji, Niladri S., Long, Philip M., Bartlett, Peter L.

We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss. We show that gradient descent drives the training loss to zero if the initial loss is small enough. When the data satisfies cert

Externí odkaz: http://arxiv.org/abs/2012.02409

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání