Width of Minima Reached by Stochastic Gradient Descent is Influenced by Learning Rate to Batch Size Ratio
Autor: | Yoshua Bengio, Asja Fischer, Amos Storkey, Nicolas Ballas, Stanisław Jastrzębski, Devansh Arpit, Zachary Kenton |
---|---|
Jazyk: | angličtina |
Předmět: |
Computer Science::Machine Learning
060102 archaeology 06 humanities and the arts 02 engineering and technology Generalization error Maxima and minima Stochastic gradient descent Convergence (routing) 0202 electrical engineering electronic engineering information engineering Range (statistics) Applied mathematics Deep neural networks 020201 artificial intelligence & image processing 0601 history and archaeology Size ratio Mathematics |
Zdroj: | Lecture Notes in Computer Science Lecture Notes in Computer Science-Artificial Neural Networks and Machine Learning – ICANN 2018 Artificial Neural Networks and Machine Learning – ICANN 2018-27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part III Artificial Neural Networks and Machine Learning – ICANN 2018 ISBN: 9783030014230 ICANN (3) |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-030-01424-7_39 |
Popis: | We show that the dynamics and convergence properties of SGD are set by the ratio of learning rate to batch size. We observe that this ratio is a key determinant of the generalization error, which we suggest is mediated by controlling the width of the final minima found by SGD. We verify our analysis experimentally on a range of deep neural networks and datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |