Functionality-Based Processing-in-Memory Accelerator for Deep Convolutional Neural Networks
Autor: | Su-Kyung Yoon, Jeong-Geun Kim, Min-Jae Kim, Shin-Dug Kim |
---|---|
Rok vydání: | 2021 |
Předmět: |
prefetch
Hardware_MEMORYSTRUCTURES General Computer Science Hybrid Memory Cube Computer science computer system Bandwidth (signal processing) deep neural network General Engineering 3D memory accelerator architectures Energy consumption Convolutional neural network TK1-9971 Computer architecture Shared memory artificial intelligence accelerator Scalability General Materials Science Electrical engineering. Electronics. Nuclear engineering Latency (engineering) Conventional memory |
Zdroj: | IEEE Access, Vol 9, Pp 145098-145108 (2021) |
ISSN: | 2169-3536 |
Popis: | Processing-in-memory (PIM) architectures show the advantage of handling applications that generate complicated memory request patterns; usually, those kinds of memory streams degrade the application’s performance in conventional memory hierarchy systems. In particular, deep convolutional neural networks (DCNNs) processing that consists of several functionalities could be highly optimized if PIM cores can extend the processing capability and data accessibility. In this work, we propose a functionality-based PIM accelerator for DCNNs. We design several modules in addition to the conventional PIM system based on a hybrid memory cube (HMC). First, we compose a new buffer module, namely, a shared cache, in which PIM cores are provided DCNN functionalities and pre-trained weights. The PIM cores subsequently enhance computational utilization and data accessibility. Second, an efficient replacement method complements the shared cache to optimize the data miss rate of DCNN processing. Third, we compose dual prefetchers that can deal with DCNN’s memory access patterns, thereby reducing the system’s overall latency. Fourth, we compose a PIM scheduler for PIM core-level autonomous request control. The PIM scheduler relieves the host processor of significant computational loads, achieving the overall latency of the system and reducing the energy consumption. By the performance evaluation based on the trace-driven HMC simulator, our proposed model improves average latency and bandwidth by 38.9 and 27.9 % with only 18.7 % more energy consumption compared with conventional HMC-based PIM systems. Our system also achieves scalable processing performance because when the DCNN becomes deeper, it processes faster than conventional PIM systems. |
Databáze: | OpenAIRE |
Externí odkaz: |