Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization
Autor: | Zhaoran Wang, Xingguo Li, Junwei Lu, Han Liu, Jarvis Haupt, Raman Arora, Tuo Zhao |
---|---|
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
Hessian matrix Pure mathematics Machine Learning (stat.ML) 02 engineering and technology 010501 environmental sciences Library and Information Sciences 01 natural sciences Convexity Machine Learning (cs.LG) Matrix decomposition 010104 statistics & probability symbols.namesake Matrix (mathematics) Statistics - Machine Learning Saddle point FOS: Mathematics 0202 electrical engineering electronic engineering information engineering Symmetric matrix 0101 mathematics Mathematics - Optimization and Control 0105 earth and related environmental sciences Mathematics 020206 networking & telecommunications Stationary point Computer Science Applications Maxima and minima Computer Science - Learning Optimization and Control (math.OC) symbols Phase retrieval Information Systems |
Zdroj: | ITA |
ISSN: | 1557-9654 0018-9448 |
DOI: | 10.1109/tit.2019.2898663 |
Popis: | We propose a general theory for studying the \xl{landscape} of nonconvex \xl{optimization} with underlying symmetric structures \tz{for a class of machine learning problems (e.g., low-rank matrix factorization, phase retrieval, and deep linear neural networks)}. In specific, we characterize the locations of stationary points and the null space of Hessian matrices \xl{of the objective function} via the lens of invariant groups\removed{for associated optimization problems, including low-rank matrix factorization, phase retrieval, and deep linear neural networks}. As a major motivating example, we apply the proposed general theory to characterize the global \xl{landscape} of the \xl{nonconvex optimization in} low-rank matrix factorization problem. In particular, we illustrate how the rotational symmetry group gives rise to infinitely many nonisolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: ($\cR_1$) the region containing the neighborhoods of all strict saddle points, where the objective has negative curvatures; ($\cR_2$) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and ($\cR_3$) the complement of the above regions, where the gradient has sufficiently large magnitudes. We further extend our result to the matrix sensing problem. Such global landscape implies strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions. |
Databáze: | OpenAIRE |
Externí odkaz: |