Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

Autor: Zhaoran Wang, Xingguo Li, Junwei Lu, Han Liu, Jarvis Haupt, Raman Arora, Tuo Zhao
Rok vydání: 2019
Předmět:
FOS: Computer and information sciences
Hessian matrix
Pure mathematics
Machine Learning (stat.ML)
02 engineering and technology
010501 environmental sciences
Library and Information Sciences
01 natural sciences
Convexity
Machine Learning (cs.LG)
Matrix decomposition
010104 statistics & probability
symbols.namesake
Matrix (mathematics)
Statistics - Machine Learning
Saddle point
FOS: Mathematics
0202 electrical engineering
electronic engineering
information engineering

Symmetric matrix
0101 mathematics
Mathematics - Optimization and Control
0105 earth and related environmental sciences
Mathematics
020206 networking & telecommunications
Stationary point
Computer Science Applications
Maxima and minima
Computer Science - Learning
Optimization and Control (math.OC)
symbols
Phase retrieval
Information Systems
Zdroj: ITA
ISSN: 1557-9654
0018-9448
DOI: 10.1109/tit.2019.2898663
Popis: We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}. In specific, we characterize the locations of stationary points and the null space of Hessian matrices \xl{of the objective function} via the lens of invariant groups\removed{for associated optimization problems, including low-rank matrix factorization, phase retrieval, and deep linear neural networks}. As a major motivating example, we apply the proposed general theory to characterize the global \xl{landscape} of the \xl{nonconvex optimization in} low-rank matrix factorization problem. In particular, we illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: ($\cR_1$) the region containing the neighborhoods of all strict saddle points, where the objective has negative curvatures; ($\cR_2$) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and ($\cR_3$) the complement of the above regions, where the gradient has sufficiently large magnitudes. We further extend our result to the matrix sensing problem. Such global landscape implies strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.
Databáze: OpenAIRE