Autor: |
Xin Zhang, Vijayalakshmi Srinivasan, Wei Wang, Jungwook Choi, Siyu Koswatta, Mingu Kang, Li Yulong, Bruce M. Fleischer, Radhika Jain, Michael R. Scheuermann, Kerstin Schelm, Kailash Gopalakrishnan, Monodeep Kar, Zhibin Ren, Michael A. Guillorn, Swagath Venkataramani, Howard M. Haynie, Xiao Sun, Matthew M. Ziegler, Hung Tran, Sae Kyu Lee, Kyu-hyoun Kim, Joel Abraham Silberman, Martin Lutz, Silvia Melitta Mueller, Sunil Shukla, Pong-Fei Lu, Vidhi Zalani, Ching Zhou, Brian W. Curran, Vinay Velji Shah, Naigang Wang, Leland Chang, Robert Casatuta, Alyssa Herbert, Nianzheng Cao, Scot H. Rider, Marcel Schaal, Ankur Agrawal, Jinwook Oh, Jinwook Jung, James J. Bonanno, Matthew Cohen, Chia-Yu Chen |
Rok vydání: |
2021 |
Předmět: |
|
Zdroj: |
ISSCC |
Popis: |
Low-precision computation is the key enabling factor to achieve high compute densities (T0PS/W and T0PS/mm2) in AI hardware accelerators across cloud and edge platforms. However, robust deep learning (DL) model accuracy equivalent to high-precision computation must be maintained. Improvements in bandwidth, architecture, and power management are also required to harness the benefit of reduced precision by feeding and supporting more parallel engines to achieve high sustained utilization and optimize performance within a given product power envelope. In this work, we present a 4-core AI chip in 7nm EUV technology that exploits cutting-edge algorithmic advances for iso-accurate models in low-precision training and inference [1, 2] and aggressive circuit/architecture optimization to achieve leading-edge power-performance. The chip supports fp16 (DLFIoat16 [8]) and hybrid-fp8(hfp8) [1] formats for training and inference of DL models, as well as int4 and int2 formats for highly scaled inference. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|