A Floating-Point 6T SRAM In-Memory-Compute Macro Using Hybrid-Domain Structure for Advanced AI Edge Chips

Autor: Wu, Ping-Chun, Su, Jian-Wei, Hong, Li-Yang, Ren, Jin-Sheng, Chien, Chih-Han, Chen, Ho-Yu, Ke, Chao-En, Hsiao, Hsu-Ming, Li, Sih-Han, Sheu, Shyh-Shyuan, Lo, Wei-Chung, Chang, Shih-Chieh, Lo, Chung-Chuan, Liu, Ren-Shuo, Hsieh, Chih-Cheng, Tang, Kea-Tiong, Chang, Meng-Fan
Zdroj: IEEE Journal of Solid-State Circuits; January 2024, Vol. 59 Issue: 1 p196-207, 12p
Abstrakt: Advanced artificial intelligence edge devices are expected to support floating-point (FP) multiply and accumulation operations while ensuring high energy efficiency and high inference accuracy. This work presents an FP compute-in-memory (CIM) macro that exploits the advantages of computing in the time, digital, and analog-voltage domain for high energy efficiency and accuracy. This work employs: 1) a hybrid-domain macrostructure to enable the computation of both the exponent and mantissa within the same CIM macro; 2) a time-domain computing scheme for energy-efficient exponent computation; 3) a product-exponent-based input-mantissa alignment scheme to enable the accumulation of the product mantissa in the same column; and 4) a place-value-dependent digital–analog-hybrid computing scheme to enable energy-efficient mantissa computations of sufficient accuracy. A 22-nm 832-kB FP-CIM macro fabricated using foundry-provided compact 6T-static random access memory (SRAM) cells achieved a high energy efficiency of 72.14 tera-floating-point operations per second (TFLOPS)/W while performing FP-multiply-and-accumulate (MAC) operations involving BF16-input, BF16-weight, FP32-output, and 128 accumulations.
Databáze: Supplemental Index