Autor: |
Abinand Nallathambi, Christin David Bose, Wilfried Haensch, Anand Raghunathan |
Jazyk: |
angličtina |
Rok vydání: |
2024 |
Předmět: |
|
Zdroj: |
Frontiers in Artificial Intelligence, Vol 7 (2024) |
Druh dokumentu: |
article |
ISSN: |
2624-8212 |
DOI: |
10.3389/frai.2024.1268317 |
Popis: |
In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC accelerators achieves high degrees of parallelism. However, two challenges that arise in this approach are the highly non-uniform distribution of layer processing times and high area requirements. We propose LRMP, a method to jointly apply layer replication and mixed precision quantization to improve the performance of DNNs when mapped to area-constrained IMC accelerators. LRMP uses a combination of reinforcement learning and mixed integer linear programming to search the replication-quantization design space using a model that is closely informed by the target hardware architecture. Across five DNN benchmarks, LRMP achieves 2.6–9.3× latency and 8–18× throughput improvement at minimal ( |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|