PISA-DMA: Processing-in-Memory Instruction Set Architecture Using DMA

Autor:	Won Jun Lee, Chang Hyun Kim, Yoonah Paik, Seon Wook Kim
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Processing-in-DRAM direct memory access instruction set architecture PIM offloading Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 11, Pp 8622-8632 (2023)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2023.3238812
Popis:	Processing-in-memory (PIM) has attracted attention to overcome the memory bandwidth limitation, especially for computing memory-intensive DNN applications. Most PIM approaches use the CPU’s memory requests to deliver instructions and operands to the PIM engines, making a core busy and incurring unnecessary data transfer, thus, resulting in significant offloading overhead. DMA can resolve the issue by transferring a high volume of successive data without intervening CPU and polluting the memory hierarchy, thus perfectly fitting the PIM concept. However, the small computing resources of DRAM-based PIM devices allow us to transfer only small amounts of data at one DMA transaction and require a large number of descriptors, thus still incurring significant offloading overhead. This paper introduces PIM Instruction Set Architecture (ISA) using a DMA descriptor called PISA-DMA to express a PIM opcode and operand in a single descriptor. Our ISA makes PIM programming intuitive by thinking of committing one PIM instruction as completing one DMA transaction and representing a sequence of PIM instructions using the DMA descriptor list. Also, PISA-DMA minimizes the offloading overhead while guaranteeing compatibility with commercial platforms. Our PISA-DMA eliminates the opcode offloading overhead and achieves 1.25x, 1.31x, and 1.29x speedup over the baseline PIM at the sequence length of 128 with the BERT, RoBERTa, and GPT-2 models, respectively, in ONNX runtime with real machines. Also, we study how our proposed PISA affects performance in compiler optimization and show that the operator fusion of matrix-matrix multiplication and element-wise addition achieved 1.04x speedup, a similar performance gain using conventional ISAs.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/2f4474122e7a48469850b71558444ba4 Zobrazit plný text záznamu View record in DOAJ