Autor: |
Naoki Nagamatsu, Kenshiro Ise, Yuko Hara |
Jazyk: |
angličtina |
Rok vydání: |
2024 |
Předmět: |
|
Zdroj: |
IEEE Access, Vol 12, Pp 137439-137454 (2024) |
Druh dokumentu: |
article |
ISSN: |
2169-3536 |
DOI: |
10.1109/ACCESS.2024.3455251 |
Popis: |
Split computing (SC) is an emerging technique to perform the inference task of deep neural network (DNN) models using both mobile devices and cloud/edge servers in a hybrid manner. To improve the end-to-end inference time over the network, SC splits a single DNN model into a head model and a tail model for deployment on the mobile device and the server, respectively. A further extension of SC, referred to as dynamic SC (DSC), determines the split point dynamically depending on various network conditions such as bandwidth. This article proposes a DNN optimization approach for DSC based on mixed-precision quantization. Given a vanilla DNN model, our work optimizes the given model in two steps. First, a DSC-aware mixed-precision layer-wise quantization is performed statically via neural architecture search to generate multiple potential split points. Then a bitwidth-wise DSC algorithm is applied to dynamically select one optimal split point among the candidate points. Our evaluation on the EfficientNet-B0 and EfficientNet-B3 architectures demonstrated that our work provides more effective split points than existing quantization works while mitigating the degradation of inference accuracy. In terms of the end-to-end inference time, on the EfficientNet-B0 (B3) architecture, our work obtained relative average and maximum gains of 9.12% (4.05%) and 27.49% (12.42%), respectively, over a state-of-the-art mix-precision quantization work while achieving comparable accuracy. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|