Differentiable Neural Architecture, Mixed Precision and Accelerator Co-Search

Autor:	Krishna Teja Chitty-Venkata, Yiming Bian, Murali Emani, Venkatram Vishwanath, Arun K. Somani
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Mixed precision quantization hardware-aware neural architecture search accelerator network co-search Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 11, Pp 106670-106687 (2023)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2023.3320133
Popis:	Quantization, effective Neural Network architecture, and efficient accelerator hardware are three important design paradigms to maximize accuracy and efficiency. Mixed Precision Quantization is a process of assigning different precision to different Neural Network layers for optimized inference. Neural Architecture Search (NAS) is a process of automatically designing the neural network for a task and can also be extended to search for the precision of each weight and activation matrix. In this paper, we develop the following three methods: (i) Fast Differentiable Hardware-aware Mixed Precision Quantization Search method to find optimal precision, (ii) Joint Differentiable hardware-aware Architecture and Mixed Precision Quantization Co-search, (iii) Joint Accelerator, Architecture, and Precision triple co-search to find best possibilities in all the three worlds. We demonstrate the effectiveness of our proposed methods targeting Bitfusion accelerator by searching mixed precision models on MobilenetV2. We achieve better accuracy-latency trade-off models than the manually designed and previously proposed search methods.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/2495076778684d1fa1f37f0873e6a844 Zobrazit plný text záznamu View record in DOAJ