Pathological Spectra of the Fisher Information Metric and Its Variants in Deep Neural Networks

Autor:	Shun-ichi Amari, Shotaro Akaho, Ryo Karakida
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Hessian matrix Computer Science - Machine Learning Cognitive Neuroscience Feature vector FOS: Physical sciences Machine Learning (stat.ML) 02 engineering and technology 01 natural sciences Machine Learning (cs.LG) Machine Learning 010104 statistics & probability symbols.namesake Arts and Humanities (miscellaneous) Statistics - Machine Learning 0202 electrical engineering electronic engineering information engineering Metric tensor 0101 mathematics Fisher information Mathematics 020206 networking & telecommunications Disordered Systems and Neural Networks (cond-mat.dis-nn) Condensed Matter - Disordered Systems and Neural Networks Kernel (statistics) Metric (mathematics) Softmax function symbols Neural Networks Computer Algorithm Fisher information metric
Zdroj:	Neural Computation. 33:2274-2307
ISSN:	1530-888X 0899-7667
DOI:	10.1162/neco_a_01411
Popis:	The Fisher information matrix (FIM) plays an essential role in statistics and machine learning as a Riemannian metric tensor or a component of the Hessian matrix of loss functions. Focusing on the FIM and its variants in deep neural networks (DNNs), we reveal their characteristic scale dependence on the network width, depth and sample size when the network has random weights and is sufficiently wide. This study covers two widely-used FIMs for regression with linear output and for classification with softmax output. Both FIMs asymptotically show pathological eigenvalue spectra in the sense that a small number of eigenvalues become large outliers depending the width or sample size while the others are much smaller. It implies that the local shape of the parameter space or loss landscape is very sharp in a few specific directions while almost flat in the other directions. In particular, the softmax output disperses the outliers and makes a tail of the eigenvalue density spread from the bulk. We also show that pathological spectra appear in other variants of FIMs: one is the neural tangent kernel; another is a metric for the input signal and feature space that arises from feedforward signal propagation. Thus, we provide a unified perspective on the FIM and its variants that will lead to more quantitative understanding of learning in large-scale DNNs. 23 pages, 7 figures; v2: minor improvements, Section 3.4 added
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::105a98a912b7be9d39fbf28107b5c927 https://doi.org/10.1162/neco_a_01411 Zobrazit plný text záznamu Plný text ve formátu PDF