Magic for the Age of Quantized DNNs

Autor:	Sawada, Yoshihide, Saiin, Ryuji, Suetake, Kazuma
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Neural and Evolutionary Computing
Druh dokumentu:	Working Paper
Popis:	Recently, the number of parameters in DNNs has explosively increased, as exemplified by LLMs (Large Language Models), making inference on small-scale computers more difficult. Model compression technology is, therefore, essential for integration into products. In this paper, we propose a method of quantization-aware training. We introduce a novel normalization (Layer-Batch Normalization) that is independent of the mini-batch size and does not require any additional computation cost during inference. Then, we quantize the weights by the scaled round-clip function with the weight standardization. We also quantize activation functions using the same function and apply surrogate gradients to train the model with both quantized weights and the quantized activation functions. We call this method Magic for the age of Quantised DNNs (MaQD). Experimental results show that our quantization method can be achieved with minimal accuracy degradation. Comment: 14 pages, 5 figures, 4 tables
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2403.14999 Zobrazit plný text záznamu View this record from Arxiv