Zobrazeno 1 - 10
of 14
pro vyhledávání: '"Ephrem C. Wu"'
Autor:
Ephrem C. Wu
We describe a method for training accurate Transformer machine-translation models to run inference using 8-bit integer (INT8) hardware matrix multipliers, as opposed to the more costly single-precision floating-point (FP32) hardware. Unlike previous
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9dcf6d8e78f134b92e36abc4f4ffdfdb
http://arxiv.org/abs/2001.00926
http://arxiv.org/abs/2001.00926
Autor:
Chen Dewei, Zhang Bo, Ephrem C. Wu, Jie Miao, Wang Yuwei, Meng Yu, Zhang Heng, Yu Xiaoyu, Biao Min, Gao Jianlin
Publikováno v:
FPL
Intensive computation is entering data centers with multiple workloads of deep learning. To balance the compute efficiency, performance, and total cost of ownership (TCO), the use of a field-programmable gate array (FPGA) with reconfigurable logic pr
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2928da3aa77aa72962828da3af70554a
http://arxiv.org/abs/1909.07973
http://arxiv.org/abs/1909.07973
Publikováno v:
FPGA
To enhance the performance of FPGA-based neural-network accelerators, maximizing both operating clock rates and compute efficiency is paramount. Streamlining data movement between memory and compute holds the key to boosting these metrics. To unleash
Autor:
Mark Charlebois, Dave Fick, Sachin Satish Idgunji, Yuchen Zhou, Michael Thomson, Ashish Sirasao, George Yuan, Anton Lokhmotov, Koichi Yamada, Tom St. John, Bing Yu, Jeff Jiao, Arun Tejusve Raghunath Rajan, Paulius Micikevicius, Ephrem C. Wu, Francisco Massa, Carole-Jean Wu, Hanlin Tang, David Lee, William Chou, Frank Wei, Jared Duke, Cody Coleman, Sam Davis, Jeffery Liao, Itay Hubara, Dilip Sequeira, Lingjie Xu, Pan Deng, Vijay Janapa Reddi, Guenther Schmuelling, Gennady Pekhimenko, Maximilien Breughe, Peng Meng, Greg Diamos, David Kanter, Colin Osborne, Thomas B. Jablin, Peizhao Zhang, Fei Sun, Pankaj Kanwar, Ramesh Chukka, J. Scott Gardner, Aaron Zhong, Christine Cheng, Peter Mattson, Brian M. Anderson
Publikováno v:
ISCA
Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate ex
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7a2417894de20d08d29ba667a4080f9a
Autor:
Henley Liu, Suresh Ramalingam, Xin Wu, Tom Lee, Susan Wu, Woon-Seong Kwon, Boon Y. Ang, Myongseob Kim, Liam Madden, Ephrem C. Wu, Jonathan Chang
Publikováno v:
3D Integration in VLSI Circuits ISBN: 9781315200699
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::4c514c7f558307e67b5d0e66c9806f85
https://doi.org/10.1201/9781315200699-3
https://doi.org/10.1201/9781315200699-3
Publikováno v:
FPL
FPGA-based neural-networks typically leave performance on the table because the DSP resources run at less than a third of the peak clock rate. This paper presents a processing array architected to consistently achieve timing closure at 100% of the pe
Autor:
Ephrem C. Wu
Publikováno v:
IEEE Communications Magazine. 50:188-194
As line interfaces in communications chassis transition to 100 Gb/s and higher per port, many in the industry question when electrical backplanes inside these chassis will give way to optical ones. Provided that the maximum card-to-card distance over
Publikováno v:
Scopus-Elsevier
This paper studies package reliability for the industry's first heterogeneous Stacked Silicon Interconnect (SSI) FPGA family (3D integration) delivering up to 2.78 Tb/s transceiver bandwidth. Each device is packaged on a low-temperature co-fired cera
Autor:
Inkeun Cho, Ephrem C. Wu
Publikováno v:
FPGA
A polynomial accelerator implemented with a custom high-dynamic-range number representation operates up to 534MHz in the slowest speed grade on a 28nm FPGA, a clock rate that a typical FPGA tool flow cannot achieve. This design tutorial shows how to
Autor:
Ephrem C. Wu, Chris Wyland, Suresh Ramalingam, Paul Y. Wu, Bahareh Banijamali, Khaldoon Abugharbieh
Publikováno v:
CICC
This paper reviews the interconnect and package design of a heterogeneous stacked-silicon FPGA. Up to five dice from two die types are mounted on a passive silicon interposer. A hardware- and software-scalable FPGA family can be created by mixing dif