Resource Efficient AI: Exploring Neural Network Pruning for Task Specialization
Autor: | Dieter Balemans, Philippe Reiter, Jan Steckel, Peter Hellinckx |
---|---|
Rok vydání: | 2022 |
Předmět: |
Computer. Automation
History Polymers and Plastics Industrial and Manufacturing Engineering Computer Science Applications Artificial Intelligence Hardware and Architecture Management of Technology and Innovation Mass communications Computer Science (miscellaneous) Business and International Management Engineering (miscellaneous) Software Information Systems |
Zdroj: | Internet of Things |
ISSN: | 1556-5068 2542-6605 |
DOI: | 10.2139/ssrn.4158433 |
Popis: | This paper explores the use of neural network pruning for transfer learning applications for more resource-efficient inference. The goal is to focus and optimize a neural network on a smaller specialized target task. With the advent of IoT, we have seen an immense increase in AI-based applications on mobile and embedded devices, such as wearables and other smart appliances. However, with the ever-increasing complexity and capabilities of machine learning algorithms, this push to the edge has led to new challenges due to the constraints imposed by the limited availability of resources on these devices. Some form of compression is needed to allow for state-of-the-art convolutional neural networks to run on edge devices. In this work, we adapt existing neural network pruning methods to allow them to specialize networks to only focus on a subset of what they were originally trained for. This is a transfer learning use-case where we optimize large pre-trained networks. This differs from standard optimization techniques by allowing the network to forget certain concepts and allow the network’s footprint to be even smaller. We compare different pruning criteria, including one from the field of Explainable AI (XAI), to determine which technique allows for the smallest possible network while maintaining high performance on the target task. Our results show the benefits of using network specialization when executing neural networks on embedded devices both with and without GPU acceleration. |
Databáze: | OpenAIRE |
Externí odkaz: |