Differentiable Neural Architecture, Mixed Precision and Accelerator Co-Search

Autor: Krishna Teja Chitty-Venkata, Yiming Bian, Murali Emani, Venkatram Vishwanath, Arun K. Somani
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: IEEE Access, Vol 11, Pp 106670-106687 (2023)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2023.3320133
Popis: Quantization, effective Neural Network architecture, and efficient accelerator hardware are three important design paradigms to maximize accuracy and efficiency. Mixed Precision Quantization is a process of assigning different precision to different Neural Network layers for optimized inference. Neural Architecture Search (NAS) is a process of automatically designing the neural network for a task and can also be extended to search for the precision of each weight and activation matrix. In this paper, we develop the following three methods: (i) Fast Differentiable Hardware-aware Mixed Precision Quantization Search method to find optimal precision, (ii) Joint Differentiable hardware-aware Architecture and Mixed Precision Quantization Co-search, (iii) Joint Accelerator, Architecture, and Precision triple co-search to find best possibilities in all the three worlds. We demonstrate the effectiveness of our proposed methods targeting Bitfusion accelerator by searching mixed precision models on MobilenetV2. We achieve better accuracy-latency trade-off models than the manually designed and previously proposed search methods.
Databáze: Directory of Open Access Journals