Zobrazeno 1 - 10
of 28
pro vyhledávání: '"Thomas W. Fox"'
Autor:
Michael J. Klaiber, George D. Gristede, Shih-Hsien Lo, Hiroshi Inoue, Leland Chang, Christos Vezyrtzis, Jungwook Choi, Gary W. Maier, Fanchieh Yee, Shubham Jain, Brian W. Curran, Jintao Zhang, Mingu Kang, Howard M. Haynie, Mauricio J. Serrano, Pong-Fei Lu, Silvia Melitta Mueller, Matthew M. Ziegler, Bruce M. Fleischer, Kazuaki Ishizaki, Kailash Gopalakrishnan, Michael R. Scheuermann, Ankur Agarwal, Xiao Sun, Sunil Shukla, Thomas W. Fox, Vijayalakshmi Srinivasan, Tina Babinsky, Swagath Venkataramani, Michael A. Guillorn, Ching Zhou, Nianzheng Cao, Eri Ogawa, Naigang Wang, Moriyoshi Ohara, Joel Abraham Silberman, Jinwook Oh, Marcel Schaal, Chia-Yu Chen, Wei Wang
Publikováno v:
Proceedings of the IEEE. 108:2232-2250
Advances in deep neural networks (DNNs) and the availability of massive real-world data have enabled superhuman levels of accuracy on many AI tasks and ushered the explosive growth of AI workloads across the spectrum of computing devices. However, th
Autor:
Matthew M. Ziegler, Sunil Shukla, Gary W. Maier, Jinwook Oh, Kailash Gopalakrishnan, Christos Vezyrtzis, Thomas W. Fox, Michael J. Klaiber, Howard M. Haynie, Swagath Venkataramani, Leland Chang, Jungwook Choi, Nianzheng Cao, Pong-Fei Lu, Pierce Chuang, Michael A. Guillorn, Brian W. Curran, Dongsoo Lee, Fanchieh Yee, Ankur Agrawal, Ching Zhou, Silvia Melitta Mueller, Naigang Wang, George D. Gristede, Bruce M. Fleischer, Michael R. Scheuermann, Tina Babinsky, Vijayalakshmi Srinivasan, Chia-Yu Chen, Joel Abraham Silberman, Shih-Hsien Lo
Publikováno v:
IEEE Solid-State Circuits Letters. 1:217-220
This letter presents a multi-TOPS AI accelerator core for deep learning training and inference. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employin
Autor:
Gary W. Maier, Wei Wang, Siyu Koswatta, Vijayalakshmi Srinivasan, Howard M. Haynie, George D. Gristede, Bruce M. Fleischer, Michael R. Scheuermann, Matthew M. Ziegler, Sunil Shukla, Jinwook Oh, Vicktoria Ivanov, Kailash Gopalakrishnan, Martin Lutz, Ching Zhou, Xiao Sun, Silvia Melitta Mueller, Brian W. Curran, Pong-Fei Lu, Thomas W. Fox, Swagath Venkataramani, Nianzheng Cao, Ankur Agrawal, Robert Casatuta, Naigang Wang, Jungwook Choi, Vinay Velji Shah, Alex Mesh, Marcel Schaal, Scot H. Rider, Fanchieh Yee, Joel Abraham Silberman, James J. Bonanno, Michael A. Guillorn, Mingu Kang, Sae Kyu Lee, Shimon Ben-Yehuda, Erez Ophir, Chia-Yu Chen, Matthew Cohen, Yevgeny Nustov, Leland Chang, Shih-Hsien Lo
Publikováno v:
VLSI Circuits
A processor core is presented for AI training and inference products. Leading-edge compute efficiency is achieved for robust fp16 training via efficient heterogeneous 2-D systolic array-SIMD compute engines leveraging compact DLFloat16 FPUs. Architec
Autor:
Jungwook Choi, Ching Zhou, Naigang Wang, Ankur Agrawal, Michael J. Klaiber, Matthew M. Ziegler, Fanchieh Yee, Shih-Hsien Lo, Sunil Shukla, George D. Gristede, Bruce M. Fleischer, Michael R. Scheuermann, Chia-Yu Chen, Michael A. Guillorn, Kailash Gopalakrishnan, Joel Abraham Silberman, Jinwook Oh, Howard M. Haynie, Thomas W. Fox, Vijayalakshmi Srinivasan, Brian W. Curran, Gary W. Maier, Swagath Venkataramani, Nianzheng Cao, Pong-Fei Lu, Christos Vezyrtzis, Tina Babinsky, Silvia Melitta Mueller, Pierce Chuang, Leland Chang, Dongsoo Lee
Publikováno v:
ISLPED
The combination of growth in compute capabilities and availability of large datasets has led to a re-birth of deep learning. Deep Neural Networks (DNNs) have become state-of-the-art in a variety of machine learning tasks spanning domains across visio
Autor:
Shih-Hsien Lo, Brian W. Curran, Jinwook Oh, Howard M. Haynie, Vijavalakshmi Srinivasan, Lel Chang, Fanchieh Yee, Tina Babinsky, Joel Abraham Silberman, George D. Gristede, Matthew M. Ziegler, Gary W. Maier, Bruce M. Fleischer, Michael R. Scheuermann, Nianzheng Cao, Ankur Agrawal, Ching Zhou, Chia-Yu Chen, Silvia Melitta Mueller, Jungwook Choi, Naigang Wang, Kailash Gopalakrishnan, Thomas W. Fox, Sunil Shukla, Swagath Venkataramani, Michael J. Klaiber, Christos Vezyrtzis, Pierce Chuang, Dongsoo Lee, Michael A. Guillorn, Pong-Fei Lu
Publikováno v:
VLSI Circuits
A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range
Autor:
Peter Boyle, David L. Satterfield, Philip Heidelberger, Martin Ohmacht, Changhoan Kim, George Liang-Tai Chiu, Norman H. Chist, Alan Gara, Matthias A. Blumrich, R. A. Haring, Robert W. Wisniewski, Thomas W. Fox, Michael K. Gschwind, Paul W. Coteus, Krishnan Sugavanam
Publikováno v:
IEEE Micro. 32:48-60
Blue Gene/Q aims to build a massively parallel high-performance computing system out of power-efficient processor chips, resulting in power-efficient, cost-efficient, and floor-space- efficient systems. Focusing on reliability during design helps wit
Autor:
M. Biberstein, Uzi Shvadron, Amir Geva, Thomas W. Fox, D. Naishlos, Krishnan K. Kailas, Malcolm Scott Ware, Fredy D. Neeser, Shay Ben-David, Hillery C. Hunter, Victor Zyuban, Jeff H. Derby, Sameh W. Asaad, Ayal Zaks, Jaime H. Moreno, Daniel J. Littrell
Publikováno v:
IBM Journal of Research and Development. 47:299-326
We describe an innovative, low-power, high-performance, programmable signal processor (DSP) for digital communications. The architecture of this processor is characterized by its explicit design for low-power implementations, its innovative ability t
Autor:
Junichi Mihara, Sherman M. Dance, Leland Chang, Bruce M. Fleischer, Robert Shearer, Kyle M. Holmes, Sebastian Ehrenreich, Dieter Wendel, Gary S. Ditlow, Robert K. Montoye, Salvatore N. Storino, Shohji Onishi, Yutaka Nakamura, Thomas W. Fox
Publikováno v:
ISSCC
In multi-ported register files, memory cell size grows quadratically with the total number of ports due to wordline and bitline wiring. Reducing the number of physical access ports in a memory cell can thus lead to significant area and power savings
Autor:
Carlos Costa, Carlo Bertolli, Yoonho Park, Patrick Siegl, Tong Chen, Arpith C. Jacob, Changhoan Kim, Philip Jacob, John Kevin Patrick O'Brien, Daniel A. Prener, Constantinos Evangelinos, Diego Sanchez Gallo, Jose R. Brunheroto, Pradip Bose, Samuel Antao, Martin Ohmacht, Chen-Yong Cher, J. Doi, Bruce M. Fleischer, Ravi Nair, Olivier Sallenave, Thomas W. Fox, John A. Gunnels, Leopold Grinberg, Hans M. Jacobson, Jaime H. Moreno, Kyung Dong Ryu, Bryan S. Rosenburg, Zehra Sura, Tejas Karkhanis, Mauricio J. Serrano, Krishnan Sugavanam
Publikováno v:
IBM Journal of Research and Development. 59:17:1-17:14
Many studies point to the difficulty of scaling existing computer architectures to meet the needs of an exascale system (i.e., capable of executing $10^{18} $ floating-point operations per second), consuming no more than 20 MW in power, by around the
Autor:
Daniel J. Littrell, Victor Zyuban, Jaime H. Moreno, Sameh W. Asaad, Anne-Marie Haen, Thomas W. Fox
Publikováno v:
ACM Great Lakes Symposium on VLSI
We describe a semi-custom design methodology for embedded processor cores that was prototyped through the development of a low power high performance DSP core. When compared to the standard ASIC design flow, this methodology enables significant impro