A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference

Autor:	Shih-Hsien Lo, Brian W. Curran, Jinwook Oh, Howard M. Haynie, Vijavalakshmi Srinivasan, Lel Chang, Fanchieh Yee, Tina Babinsky, Joel Abraham Silberman, George D. Gristede, Matthew M. Ziegler, Gary W. Maier, Bruce M. Fleischer, Michael R. Scheuermann, Nianzheng Cao, Ankur Agrawal, Ching Zhou, Chia-Yu Chen, Silvia Melitta Mueller, Jungwook Choi, Naigang Wang, Kailash Gopalakrishnan, Thomas W. Fox, Sunil Shukla, Swagath Venkataramani, Michael J. Klaiber, Christos Vezyrtzis, Pierce Chuang, Dongsoo Lee, Michael A. Guillorn, Pong-Fei Lu
Rok vydání:	2018
Předmět:	Multi-core processor Floating point Artificial neural network business.industry Computer science Deep learning 020208 electrical & electronic engineering Inference 02 engineering and technology Parallel computing 020202 computer hardware & architecture Scalability 0202 electrical engineering electronic engineering information engineering Artificial intelligence business Dataflow architecture Integer (computer science)
Zdroj:	VLSI Circuits
Popis:	A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture and an on-chip scratchpad hierarchy. Compute precision is optimized at 16b floating point (fp 16) for high model accuracy in training and inference as well as 1b/2b (bi-nary/ternary) integer for aggressive inference performance. At 1.5 GHz, the AI core prototype achieves 1.5 TFLOPS fp 16, 12 TOPS ternary, or 24 TOPS binary peak performance in 14nm CMOS.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::9467c7d49cc29eaa6eb08be982212c8b https://doi.org/10.1109/vlsic.2018.8502276 Zobrazit plný text záznamu