On the almost sure convergence of stochastic gradient descent in non-convex problems
Autor: | Mertikopoulos, P., Nadav Hallak, Kavis, A., Cevher, V. |
---|---|
Přispěvatelé: | Performance analysis and optimization of LARge Infrastructures and Systems (POLARIS), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Criteo AI Lab, Criteo [Paris], Ecole Polytechnique Fédérale de Lausanne (EPFL), ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019), ANR-11-LABX-0025,PERSYVAL-lab,Systemes et Algorithmes Pervasifs au confluent des mondes physique et numérique(2011), ANR-19-CE48-0018,ALIAS,Apprentissage adaptatif multi-agent(2019) |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Probability (math.PR) Machine Learning (stat.ML) Machine Learning (cs.LG) 90C15 37N40 Non-convex optimization ml-ai Primary 90C26 62L20 secondary 90C30 90C15 37N40 2020 Mathematics Subject Classification. Primary 90C26 Optimization and Control (math.OC) Statistics - Machine Learning stochastic gradient descent stochastic approximation FOS: Mathematics secondary 90C30 [MATH.MATH-OC]Mathematics [math]/Optimization and Control [math.OC] 62L20 Mathematics - Optimization and Control Mathematics - Probability |
Zdroj: | NeurIPS 2020-34th International Conference on Neural Information Processing Systems NeurIPS 2020-34th International Conference on Neural Information Processing Systems, Dec 2020, Vancouver, Canada. pp.1-32 Scopus-Elsevier |
Popis: | This paper analyzes the trajectories of stochastic gradient descent (SGD) to help understand the algorithm's convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered. Finally, we prove that the algorithm's rate of convergence to Hurwicz minimizers is $\mathcal{O}(1/n^{p})$ if the method is employed with a $\Theta(1/n^p)$ step-size schedule. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to faster convergence; we demonstrate this heuristic using ResNet architectures on CIFAR. Comment: 32 pages, 8 figures |
Databáze: | OpenAIRE |
Externí odkaz: |