Zobrazeno 1 - 10
of 63
pro vyhledávání: '"Scieur, Damien"'
Autor:
Maes, Lucas, Zhang, Tianyue H., Jolicoeur-Martineau, Alexia, Mitliagkas, Ioannis, Scieur, Damien, Lacoste-Julien, Simon, Guille-Escuret, Charles
Despite its widespread adoption, Adam's advantage over Stochastic Gradient Descent (SGD) lacks a comprehensive theoretical explanation. This paper investigates Adam's sensitivity to rotations of the parameter space. We demonstrate that Adam's perform
Externí odkaz:
http://arxiv.org/abs/2410.19964
Convex curvature properties are important in designing and analyzing convex optimization algorithms in the Hilbertian or Riemannian settings. In the case of the Hilbertian setting, strongly convex sets are well studied. Herein, we propose various def
Externí odkaz:
http://arxiv.org/abs/2312.03583
Autor:
Scieur, Damien
Despite the impressive numerical performance of the quasi-Newton and Anderson/nonlinear acceleration methods, their global convergence rates have remained elusive for over 50 years. This study addresses this long-standing issue by introducing a frame
Externí odkaz:
http://arxiv.org/abs/2305.19179
We propose SING (StabIlized and Normalized Gradient), a plug-and-play technique that improves the stability and generalization of the Adam(W) optimizer. SING is straightforward to implement and has minimal computational overhead, requiring only a lay
Externí odkaz:
http://arxiv.org/abs/2305.15997
Computing the Jacobian of the solution of an optimization problem is a central problem in machine learning, with applications in hyperparameter optimization, meta-learning, optimization as a layer, and dataset distillation, to name a few. Unrolled di
Externí odkaz:
http://arxiv.org/abs/2209.13271
The recently developed average-case analysis of optimization methods allows a more fine-grained and representative convergence analysis than usual worst-case results. In exchange, this analysis requires a more precise hypothesis over the data generat
Externí odkaz:
http://arxiv.org/abs/2206.09901
We consider the problem of upper bounding the expected log-likelihood sub-optimality of the maximum likelihood estimate (MLE), or a conjugate maximum a posteriori (MAP) for an exponential family, in a non-asymptotic way. Surprisingly, we found no gen
Externí odkaz:
http://arxiv.org/abs/2111.06826
Autor:
Scieur, Damien, Kim, Youngsung
This paper considers classification problems with hierarchically organized classes. We force the classifier (hyperplane) of each class to belong to a sphere manifold, whose center is the classifier of its super-class. Then, individual sphere manifold
Externí odkaz:
http://arxiv.org/abs/2106.13549
Publikováno v:
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:3028-3065, 2022
We develop a convergence-rate analysis of momentum with cyclical step-sizes. We show that under some assumption on the spectral gap of Hessians in machine learning, cyclical step-sizes are provably faster than constant step-sizes. More precisely, we
Externí odkaz:
http://arxiv.org/abs/2106.09687
Publikováno v:
Foundations and Trends in Optimization: Vol. 5: No. 1-2, pp 1-245 (2021)
This monograph covers some recent advances in a range of acceleration techniques frequently used in convex optimization. We first use quadratic optimization problems to introduce two key families of methods, namely momentum and nested optimization sc
Externí odkaz:
http://arxiv.org/abs/2101.09545