Improving the convergence of SGD through adaptive batch sizes

Autor:	Sievert, Scott, Shah, Shrey
Rok vydání:	2019
Předmět:	Computer Science - Machine Learning Mathematics - Optimization and Control Statistics - Machine Learning
Druh dokumentu:	Working Paper
Popis:	Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function's gradient with a small number of training examples, aka the batch size. Small batch sizes require little computation for each model update but can yield high-variance gradient estimates, which poses some challenges for optimization. Conversely, large batches require more computation but can yield higher precision gradient estimates. This work presents a method to adapt the batch size to the model's training loss. For various function classes, we show that our method requires the same order of model updates as gradient descent while requiring the same order of gradient computations as SGD. This method requires evaluating the model's loss on the entire dataset every model update. However, the required computation is greatly reduced by approximating the training loss. We provide experiments that illustrate our methods require fewer model updates without increasing the total amount of computation.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/1910.08222 Zobrazit plný text záznamu View this record from Arxiv