McDRAM: Low Latency and Energy-Efficient Matrix Computations in DRAM
Autor: | Sungjoo Yoo, Eunhyeok Park, Sungho Park, Dongyoung Kim, Hyun-Sung Shin, Park Yong-Sik |
---|---|
Rok vydání: | 2018 |
Předmět: |
010302 applied physics
Random access memory Speedup Computer science 02 engineering and technology Parallel computing 01 natural sciences Computer Graphics and Computer-Aided Design 020202 computer hardware & architecture Memory management MCDRAM 0103 physical sciences Memory architecture 0202 electrical engineering electronic engineering information engineering Overhead (computing) Electrical and Electronic Engineering Latency (engineering) Software Dram |
Zdroj: | IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 37:2613-2622 |
ISSN: | 1937-4151 0278-0070 |
DOI: | 10.1109/tcad.2018.2857044 |
Popis: | We propose a novel memory architecture for in-memory computation called McDRAM, where DRAM dies are equipped with a large number of multiply accumulate (MAC) units to perform matrix computation for neural networks. By exploiting high internal memory bandwidth and reducing off-chip memory accesses, McDRAM realizes both low latency and energy efficient computation. In our experiments, we obtained the chip layout based on the state-of-the-art memory, LPDDR4 where McDRAM is equipped with 2048 MACs in a single chip package with a small area overhead (4.7%). Compared with the state-of-the-art accelerator, TPU and the power-efficient GPU, Nvidia P4, McDRAM offers $9.5{\times }$ and $14.4{\times }$ speedup, respectively, in the case that the large-scale MLPs and RNNs adopt the batch size of 1. McDRAM also gives $2.1{\times }$ and $3.7{\times }$ better computational efficiency in TOPS/W than TPU and P4, respectively, for the large batches. |
Databáze: | OpenAIRE |
Externí odkaz: |