Mixed-Precision Neural Architecture Search and Dynamic Split Point Selection for Split Computing

Autor: Naoki Nagamatsu, Kenshiro Ise, Yuko Hara
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: IEEE Access, Vol 12, Pp 137439-137454 (2024)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2024.3455251
Popis: Split computing (SC) is an emerging technique to perform the inference task of deep neural network (DNN) models using both mobile devices and cloud/edge servers in a hybrid manner. To improve the end-to-end inference time over the network, SC splits a single DNN model into a head model and a tail model for deployment on the mobile device and the server, respectively. A further extension of SC, referred to as dynamic SC (DSC), determines the split point dynamically depending on various network conditions such as bandwidth. This article proposes a DNN optimization approach for DSC based on mixed-precision quantization. Given a vanilla DNN model, our work optimizes the given model in two steps. First, a DSC-aware mixed-precision layer-wise quantization is performed statically via neural architecture search to generate multiple potential split points. Then a bitwidth-wise DSC algorithm is applied to dynamically select one optimal split point among the candidate points. Our evaluation on the EfficientNet-B0 and EfficientNet-B3 architectures demonstrated that our work provides more effective split points than existing quantization works while mitigating the degradation of inference accuracy. In terms of the end-to-end inference time, on the EfficientNet-B0 (B3) architecture, our work obtained relative average and maximum gains of 9.12% (4.05%) and 27.49% (12.42%), respectively, over a state-of-the-art mix-precision quantization work while achieving comparable accuracy.
Databáze: Directory of Open Access Journals