Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

Autor:	Meng, Si Yi, Orvieto, Antonio, Cao, Daniel Yiming, De Sa, Christopher
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning Mathematics - Optimization and Control
Druh dokumentu:	Working Paper
Popis:	We study gradient descent (GD) dynamics on logistic regression problems with large, constant step sizes. For linearly-separable data, it is known that GD converges to the minimizer with arbitrarily large step sizes, a property which no longer holds when the problem is not separable. In fact, the behaviour can be much more complex -- a sequence of period-doubling bifurcations begins at the critical step size $2/\lambda$, where $\lambda$ is the largest eigenvalue of the Hessian at the solution. Using a smaller-than-critical step size guarantees convergence if initialized nearby the solution: but does this suffice globally? In one dimension, we show that a step size less than $1/\lambda$ suffices for global convergence. However, for all step sizes between $1/\lambda$ and the critical step size $2/\lambda$, one can construct a dataset such that GD converges to a stable cycle. In higher dimensions, this is actually possible even for step sizes less than $1/\lambda$. Our results show that although local convergence is guaranteed for all step sizes less than the critical step size, global convergence is not, and GD may instead converge to a cycle depending on the initialization.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2406.05033 Zobrazit plný text záznamu View this record from Arxiv