Magic for the Age of Quantized DNNs

Autor: Sawada, Yoshihide, Saiin, Ryuji, Suetake, Kazuma
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Recently, the number of parameters in DNNs has explosively increased, as exemplified by LLMs (Large Language Models), making inference on small-scale computers more difficult. Model compression technology is, therefore, essential for integration into products. In this paper, we propose a method of quantization-aware training. We introduce a novel normalization (Layer-Batch Normalization) that is independent of the mini-batch size and does not require any additional computation cost during inference. Then, we quantize the weights by the scaled round-clip function with the weight standardization. We also quantize activation functions using the same function and apply surrogate gradients to train the model with both quantized weights and the quantized activation functions. We call this method Magic for the age of Quantised DNNs (MaQD). Experimental results show that our quantization method can be achieved with minimal accuracy degradation.
Comment: 14 pages, 5 figures, 4 tables
Databáze: arXiv