Strong error analysis for stochastic gradient descent optimization algorithms

Autor:	Ariel Neufeld, Philippe von Wurstemberger, Arnulf Jentzen, Benno Kuckuck
Rok vydání:	2020
Předmět:	Partial differential equation Stochastic gradient descent Stochastic approximation algorithms Strong error analysis Applied Mathematics General Mathematics Probability (math.PR) Numerical Analysis (math.NA) 010501 environmental sciences Object (computer science) 01 natural sciences Facial recognition system 010104 statistics & probability Computational Mathematics Convergence (routing) FOS: Mathematics Test functions for optimization Mathematics - Numerical Analysis 0101 mathematics Special case Algorithm Mathematics - Probability 0105 earth and related environmental sciences Type I and type II errors Mathematics
Zdroj:	IMA Journal of Numerical Analysis, 41 (1)
ISSN:	1464-3642 0272-4979
DOI:	10.1093/imanum/drz055
Popis:	Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications. In this article we perform a rigorous strong error analysis for SGD optimization algorithms. In particular, we prove for every arbitrarily small $\varepsilon \in (0,\infty)$ and every arbitrarily large $p\in (0,\infty)$ that the considered SGD optimization algorithm converges in the strong $L^p$-sense with order $\frac{1}{2}-\varepsilon$ to the global minimum of the objective function of the considered stochastic approximation problem under standard convexity-type assumptions on the objective function and relaxed assumptions on the moments of the stochastic errors appearing in the employed SGD optimization algorithm. The key ideas in our convergence proof are, first, to employ techniques from the theory of Lyapunov-type functions for dynamical systems to develop a general convergence machinery for SGD optimization algorithms based on such functions, then, to apply this general machinery to concrete Lyapunov-type functions with polynomial structures, and, thereafter, to perform an induction argument along the powers appearing in the Lyapunov-type functions in order to achieve for every arbitrarily large $ p \in (0,\infty) $ strong $ L^p $-convergence rates. This article also contains an extensive review of results on SGD optimization algorithms in the scientific literature.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a18dea3fbfac8c66288a60714ff4538d https://doi.org/10.1093/imanum/drz055 Zobrazit plný text záznamu