Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio

Autor: Yoshua Bengio, Asja Fischer, Amos Storkey, Nicolas Ballas, Stanisław Jastrzębski, Devansh Arpit, Zachary Kenton
Jazyk: angličtina
Předmět:
Zdroj: Lecture Notes in Computer Science
Lecture Notes in Computer Science-Artificial Neural Networks and Machine Learning – ICANN 2018
Artificial Neural Networks and Machine Learning – ICANN 2018-27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III
Artificial Neural Networks and Machine Learning – ICANN 2018 ISBN: 9783030014230
ICANN (3)
ISSN: 0302-9743
1611-3349
DOI: 10.1007/978-3-030-01424-7_39
Popis: We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets.
Databáze: OpenAIRE