Convergence rates of stochastic gradient method with independent sequences of step-size and momentum weight

Autor:	Hwang, Wen-Liang
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning Mathematics - Optimization and Control
Druh dokumentu:	Working Paper
Popis:	In large-scale learning algorithms, the momentum term is usually included in the stochastic sub-gradient method to improve the learning speed because it can navigate ravines efficiently to reach a local minimum. However, step-size and momentum weight hyper-parameters must be appropriately tuned to optimize convergence. We thus analyze the convergence rate using stochastic programming with Polyak's acceleration of two commonly used step-size learning rates: ``diminishing-to-zero" and ``constant-and-drop" (where the sequence is divided into stages and a constant step-size is applied at each stage) under strongly convex functions over a compact convex set with bounded sub-gradients. For the former, we show that the convergence rate can be written as a product of exponential in step-size and polynomial in momentum weight. Our analysis justifies the convergence of using the default momentum weight setting and the diminishing-to-zero step-size sequence in large-scale machine learning software. For the latter, we present the condition for the momentum weight sequence to converge at each stage.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2408.02678 Zobrazit plný text záznamu View this record from Arxiv