SANA: Sensitivity-Aware Neural Architecture Adaptation for Uniform Quantization

Autor: Mingfei Guo, Zhen Dong, Kurt Keutzer
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Applied Sciences, Vol 13, Iss 18, p 10329 (2023)
Druh dokumentu: article
ISSN: 2076-3417
DOI: 10.3390/app131810329
Popis: Uniform quantization is widely taken as an efficient compression method in practical applications. Despite its merit of having a low computational overhead, uniform quantization fails to preserve sensitive components in neural networks when applied with ultra-low bit precision, which could lead to a non-trivial accuracy degradation. Previous works have applied mixed-precision quantization to address this problem. However, finding the correct bit settings for different layers always demands significant time and resource consumption. Moreover, mixed-precision quantization is not well supported on current general-purpose machines such as GPUs and CPUs and, thus, will cause intolerable overheads in deployment. To leverage the efficiency of uniform quantization while maintaining accuracy, in this paper, we propose sensitivity-aware network adaptation (SANA), which automatically modifies the model architecture based on sensitivity analysis to make it more compatible with uniform quantization. Furthermore, we formulated four different channel initialization strategies to accelerate the quantization-aware fine-tuning process of SANA. Our experimental results showed that SANA can outperform standard uniform quantization and other state-of-the-art quantization methods in terms of accuracy, with comparable or even smaller memory consumption. Notably, ResNet-50-SANA (24.4 MB) with W4A8 quantization achieved 77.8% top-one accuracy on ImageNet, which even surpassed the 77.6% of the full-precision ResNet-50 (97.8 MB) baseline.
Databáze: Directory of Open Access Journals