Abstrakt: |
The deployment of deep learning architectures on low-computational resource devices is challenging due to their high number of parameters and computational complexity. These heavy and complex architectures result in increased latency in real-time applications. However, splitting the deep architecture in a way that parallelizes the forward propagation into different subnets deploying into multiple low-computational resource devices, and then, aggregating the predictions may reduce the latency while preserving the performance. In this paper, we propose a novel deep learning architecture called Ensembled Parallel Networks (EnParaNets) that leverage network dissection, knowledge distillation, and ensemble learning strategies to reduce inference time while maintaining, even in some cases, outperforming the baseline accuracy in real-time applications. The methodology involves splitting the original network into N equal-sized blocks, forming N Sub-ParaNets for each block, and enhancing their representations using (A) contrastive knowledge distillation along with reducing Kullback–Leibler divergence between logits distributions of the teacher and student networks, and (B) L2 loss between intermediate representations of the original network and corresponding Sub-ParaNets. Predictive distributions from each Sub-ParaNet are assembled to form the final EnParaNet. The proposed EnParaNet outperforms the baseline models of seven diverse architectures: ResNet56, VGG_13, WRN_40_2, DenseNet, ResNeXt50, MobileNetv2, and ShuffleNetv2 in terms of accuracy while reducing inference time significantly using training methods A and B, respectively. Our proposed EnParaNet enhances ResNet56, VGG_13, WRN_40_2, MobileNetv2, DenseNet, ResNeXt50, and ShuffleNetv2 by 2.69%, 0.24%, 1.95%, 7.69%, 0.33%, 2.13%, and 3.12%, respectively, using training method A, where the inference time is reduced by 45%, 24%, 47%, 31%, 33%, 32%, and 44%, respectively. With training method B, EnParaNet achieves improvements of 1.75%, 2.90%, 1.09%, 3.91%, and 1.66%, with inference time reductions of 50%, 42%, 49%, 48%, and 49%, respectively. Moreover, a comprehensive ablation study analyzes the performance of the proposed technique and highlights its effectiveness and challenges. Furthermore, we also evaluate the performance of EnParaNet in transferability and adversarial robustness tasks. [ABSTRACT FROM AUTHOR] |