Zobrazeno 1 - 10
of 74
pro vyhledávání: '"Gurbuzbalaban, Mert"'
Autor:
Gurbuzbalaban, Mert
We consider the problem of minimizing a strongly convex smooth function where the gradients are subject to additive worst-case deterministic errors that are square-summable. We study the trade-offs between the convergence rate and robustness to gradi
Externí odkaz:
http://arxiv.org/abs/2309.11481
This paper considers the problem of understanding the behavior of a general class of accelerated gradient methods on smooth nonconvex functions. Motivated by some recent works that have proposed effective algorithms, based on Polyak's heavy ball meth
Externí odkaz:
http://arxiv.org/abs/2307.07030
Algorithmic stability is an important notion that has proven powerful for deriving generalization bounds for practical algorithms. The last decade has witnessed an increasing number of stability bounds for different algorithms applied on different cl
Externí odkaz:
http://arxiv.org/abs/2305.12056
Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered seve
Externí odkaz:
http://arxiv.org/abs/2205.06689
In this work, we consider strongly convex strongly concave (SCSC) saddle point (SP) problems $\min_{x\in\mathbb{R}^{d_x}}\max_{y\in\mathbb{R}^{d_y}}f(x,y)$ where $f$ is $L$-smooth, $f(.,y)$ is $\mu$-strongly convex for every $y$, and $f(x,.)$ is $\mu
Externí odkaz:
http://arxiv.org/abs/2202.09688
Gradient-related first-order methods have become the workhorse of large-scale numerical optimization problems. Many of these problems involve nonconvex objective functions with multiple saddle points, which necessitates an understanding of the behavi
Externí odkaz:
http://arxiv.org/abs/2101.02625
Publikováno v:
SIAM Journal on Optimization 2022 32:2, 795-821
We present two classes of differentially private optimization algorithms derived from the well-known accelerated first-order methods. The first algorithm is inspired by Polyak's heavy ball method and employs a smoothing approach to decrease the accum
Externí odkaz:
http://arxiv.org/abs/2008.01989
Publikováno v:
Published as a conference paper at International Conference on Machine Learning (ICML) 2021
In recent years, various notions of capacity and complexity have been proposed for characterizing the generalization properties of stochastic gradient descent (SGD) in deep learning. Some of the popular notions that correlate well with the performanc
Externí odkaz:
http://arxiv.org/abs/2006.04740
Publikováno v:
Information and Inference: A Journal of the IMA, vol. 12, no. 2, pp. 714-786, Jun. 2023
This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the 'flat' geometry around saddle points, first-order met
Externí odkaz:
http://arxiv.org/abs/2006.01106
Autor:
Gurbuzbalaban, Mert, Hu, Yuanhan
A traditional approach to initialization in deep neural networks (DNNs) is to sample the network weights randomly for preserving the variance of pre-activations. On the other hand, several studies show that during the training process, the distributi
Externí odkaz:
http://arxiv.org/abs/2005.11878