Zobrazeno 1 - 10
of 58
pro vyhledávání: '"Chatterji, Niladri S."'
Autor:
Chatterji, Niladri S., Long, Philip M.
We bound the excess risk of interpolating deep linear networks trained using gradient flow. In a setting previously used to establish risk bounds for the minimum $\ell_2$-norm interpolant, we show that randomly initialized deep linear networks can cl
Externí odkaz:
http://arxiv.org/abs/2209.09315
While a broad range of techniques have been proposed to tackle distribution shift, the simple baseline of training on an $\textit{undersampled}$ balanced dataset often achieves close to state-of-the-art-accuracy across several popular benchmarks. Thi
Externí odkaz:
http://arxiv.org/abs/2205.13094
In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-
Externí odkaz:
http://arxiv.org/abs/2202.07626
Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the g
Externí odkaz:
http://arxiv.org/abs/2202.05928
Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural netw
Externí odkaz:
http://arxiv.org/abs/2112.12986
Autor:
Chatterji, Niladri S., Long, Philip M.
We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime. We apply this result to obtain a lower bound for basis pursuit (the minimum $\ell_1$-norm interpola
Externí odkaz:
http://arxiv.org/abs/2110.02914
The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign over
Externí odkaz:
http://arxiv.org/abs/2108.11489
We study a theory of reinforcement learning (RL) in which the learner receives binary feedback only once at the end of an episode. While this is an extreme test case for theory, it is also arguably more representative of real-world applications than
Externí odkaz:
http://arxiv.org/abs/2105.14363
We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU, such as Swish and t
Externí odkaz:
http://arxiv.org/abs/2102.04998
We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss. We show that gradient descent drives the training loss to zero if the initial loss is small enough. When the data satisfies cert
Externí odkaz:
http://arxiv.org/abs/2012.02409