Novel Gradient Sparsification Algorithm via Bayesian Inference

Autor:	Bereyhi, Ali, Liang, Ben, Boudreau, Gary, Afana, Ali
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning Computer Science - Information Theory Electrical Engineering and Systems Science - Signal Processing
Druh dokumentu:	Working Paper
Popis:	Error accumulation is an essential component of the Top-$k$ sparsification method in distributed gradient descent. It implicitly scales the learning rate and prevents the slow-down of lateral movement, but it can also deteriorate convergence. This paper proposes a novel sparsification algorithm called regularized Top-$k$ (RegTop-$k$) that controls the learning rate scaling of error accumulation. The algorithm is developed by looking at the gradient sparsification as an inference problem and determining a Bayesian optimal sparsification mask via maximum-a-posteriori estimation. It utilizes past aggregated gradients to evaluate posterior statistics, based on which it prioritizes the local gradient entries. Numerical experiments with ResNet-18 on CIFAR-10 show that at $0.1\%$ sparsification, RegTop-$k$ achieves about $8\%$ higher accuracy than standard Top-$k$. Comment: To appear in Proc. IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2409.14893 Zobrazit plný text záznamu View this record from Arxiv