CANET: Quantized Neural Network Inference With 8-bit Carry-Aware Accumulator

Autor:	Jingxuan Yang, Xiaoqin Wang, Yiying Jiang
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Convolutional neural network quantization efficient inference low-precision accumulator Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 38765-38772 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3369889
Popis:	Neural network quantization represents weights and activations with few bits, greatly reducing the overhead of multiplications. However, due to the recursive accumulation operations, high-precision accumulators are still required in multiply-accumulate (MAC) units to avoid overflow, incurring significant computational overhead. This constraint limits the efficient deployment of quantized NNs on resource-constrained platforms. To address this problem, we present a novel framework named CANET, which adapts the 8-bit quantized model to execute MAC operations with 8-bit accumulators. CANET not only employs 8-bit carry-aware accumulators to represent overflow data correctly, but also adaptively learns the optimal format per layer to minimize truncation errors. Meanwhile, a weight-oriented reordering method is developed to reduce the transfer length of the carry. CANET is evaluated on three networks in the ImageNet classification task, where comparable performance with state-of-the-art methods is realized. Finally, we implement the proposed architecture on a custom hardware platform, demonstrating a reduction of 40% in power and 49% in area compared with the MAC unit with 32-bit accumulators.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/ede63cc5ab2c4735bfd21a00f50e49f1 Zobrazit plný text záznamu View record in DOAJ