Sensitivity Analysis and Compression Opportunities in DNNs Using Weight Sharing
Autor: | Alberto Bosio, Etienne Dupuis, David Novo, Ian O'Connor |
---|---|
Přispěvatelé: | École Centrale de Lyon (ECL), Université de Lyon, ADAptive Computing (ADAC), Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM), Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM) |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Design space exploration
Computer science 02 engineering and technology Machine learning computer.software_genre 01 natural sciences Design objective 0103 physical sciences Weight Sharing 0202 electrical engineering electronic engineering information engineering Embedded System [INFO]Computer Science [cs] Sensitivity (control systems) Pruning (decision trees) Approximate Computing Cluster analysis Quantization (image processing) 010302 applied physics Deep Neural Networks Artificial neural network business.industry Data compression ratio Design Space Exploration Hardware Accelerator 020202 computer hardware & architecture Artificial intelligence business computer Model Compression |
Zdroj: | 23rd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS) 23rd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), Apr 2020, Novi Sad, Serbia. ⟨10.1109/DDECS50862.2020.9095658⟩ DDECS |
DOI: | 10.1109/DDECS50862.2020.9095658⟩ |
Popis: | International audience; Deep artificial Neural Networks (DNNs) are currently one of the most intensively and widely used predictive models in the field of machine learning. However, the computational workload involved in DNNs is typically out of reach for lowpower embedded devices. The approximate computing paradigm can be exploited to reduce the DNN complexity. It improves performance and energy-efficiency by relaxing the need for fully accurate operations. There are a large number of implementation options leveraging many approximation techniques (e.g., pruning, quantization, weight-sharing, low-rank factorization, knowledge distillation, etc.). However, to the best of our knowledge, a few or no automated approach exists to explore, select and generate the best approximate version of a given DNN according to design objectives. The goal of this paper is to demonstrate that the design space exploration phase can enable significant network compression without noticeable accuracy loss. We demonstrate this via an example based on weight sharing and show that our direct conversion method can obtain a 4.85x compression rate with 0.14% accuracy loss in ResNet18 and 4.91x compression rate with 0.44% accuracy loss in SqueezeNet without involving retraining steps. |
Databáze: | OpenAIRE |
Externí odkaz: |