An 8.8 TFLOPS/W Floating-Point RRAM-Based Compute-in-Memory Macro Using Low Latency Triangle-Style Mantissa Multiplication

Autor: Hu, Xianwu, Wang, Yu, Ma, Zizhao, Wen, Gan, Wang, Zeming, Lu, Zhichao, Liu, Yunlong, Li, Yanlei, Liang, Xingdong, Zeng, Xiaoyang, Xie, Yufeng
Zdroj: Circuits and Systems II: Express Briefs, IEEE Transactions on; November 2023, Vol. 70 Issue: 11 p4216-4220, 5p
Abstrakt: High-precision computation with low latency and high energy efficiency is required for AI-driven application and scientific computing. Emerging compute-in-memory (CIM) technology shows a great potential to accelerate multiplication and accumulation (MAC) operations which are frequently executed in such scenarios. Resistive RAM (RRAM) is highly suitable for CIM due to its excellent features such as nonvolatility, small cell size and MAC-friendly structure. However, the existing RRAM CIMs focus on the acceleration of fixed-point/integer operations. Several works adopt the logic-CIM structure to support high-precision Floating-point (FP) calculations, but they require lots of cycles and area to perform a FP operation. To meet the need of low latency and high energy efficiency of widely used FP calculation, we propose an accelerated FP-MAC architecture, based on 40nm RRAM CIM array. A full-parallel data input scheme and triangle weights arrangement is proposed for low latency multi-bits multiplication. A non-uniformly grouped sense amplifiers (NUGSAs) array is adopted for energy and area saving. Experiments show that the proposed FP-MAC design achieves an energy efficiency of up to 8.8 TFLOPS/W at FP8 mode and 3.3 TFLOPS/W at bFP16 mode, and the computing latency is 3.34ns.
Databáze: Supplemental Index