Robust Training of Neural Networks at Arbitrary Precision and Sparsity

Autor:	Ye, Chengxi, Chu, Grace, Liu, Yanfeng, Zhang, Yichi, Lew, Lukasz, Howard, Andrew
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition Mathematics - Numerical Analysis
Druh dokumentu:	Working Paper
Popis:	The discontinuous operations inherent in quantization and sparsification introduce obstacles to backpropagation. This is particularly challenging when training deep neural networks in ultra-low precision and sparse regimes. We propose a novel, robust, and universal solution: a denoising affine transform that stabilizes training under these challenging conditions. By formulating quantization and sparsification as perturbations during training, we derive a perturbation-resilient approach based on ridge regression. Our solution employs a piecewise constant backbone model to ensure a performance lower bound and features an inherent noise reduction mechanism to mitigate perturbation-induced corruption. This formulation allows existing models to be trained at arbitrarily low precision and sparsity levels with off-the-shelf recipes. Furthermore, our method provides a novel perspective on training temporal binary neural networks, contributing to ongoing efforts to narrow the gap between artificial and biological neural networks.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2409.09245 Zobrazit plný text záznamu View this record from Arxiv