Zobrazeno 1 - 6
of 6
pro vyhledávání: '"Zhangxiaowen Gong"'
Autor:
Zhangxiaowen Gong, Houxiang Ji, Yao Yao, Christopher W. Fletcher, Christopher J. Hughes, Josep Torrellas
Publikováno v:
Proceedings of the 49th Annual International Symposium on Computer Architecture.
Autor:
Christopher J. Hughes, Sara S. Baghsorkhi, Zhangxiaowen Gong, Josep Torrellas, Christopher W. Fletcher, Houxiang Ji
Publikováno v:
MICRO
General Matrix Multiplication (GEMM) is the key operation in Deep Neural Networks (DNNs). While dense GEMM uses SIMD CPUs efficiently, sparse GEMM is much less efficient, especially at the modest levels of unstructured sparsity common in DNN inferenc
Autor:
Christopher W. Fletcher, Christopher J. Hughes, Josep Torrellas, Zhangxiaowen Gong, Houxiang Ji
Publikováno v:
PACT
Our community has improved the efficiency of deep learning applications by exploiting sparsity in inputs. Most of that work, though, is for inference, where weight sparsity is known statically, and/or for specialized hardware. In this paper, we propo
Autor:
Neftali Watkinson, David Padua, David C. Wong, Zehra Sura, Zhangxiaowen Gong, Alexandru Nicolau, Saeed Maleki, Zhi Chen, Josep Torrellas, Alexander V. Veidenbaum, Justin Szaday
Publikováno v:
Proceedings of the ACM on Programming Languages. 2:1-29
Modern compiler optimization is a complex process that offers no guarantees to deliver the fastest, most efficient target code. For this reason, compilers struggle to produce a stable performance from versions of code that carry out the same computat
Autor:
Neftali Watkinson, Alexandru Nicolau, Zhi Chen, Alexander V. Veidenbaum, Zhangxiaowen Gong, Aniket Shivam
Publikováno v:
Languages and Compilers for Parallel Computing ISBN: 9783030352240
LCPC
LCPC
Vectorization is the process of transforming the scalar implementation of an algorithm into vector form. This transformation aims to benefit from parallelism through the generation of microprocessor vector instructions. Using abstract models and sour
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::6c5ee2cbc7b1cbdf8b19f652b5a334dc
https://doi.org/10.1007/978-3-030-35225-7_1
https://doi.org/10.1007/978-3-030-35225-7_1
Autor:
Justin Szaday, David C. Wong, Zhi Chen, Gerald DeJong, Alexander V. Veidenbaum, Saeed Maleki, Josep Torrellas, Zhangxiaowen Gong, Alexandru Nicolau, Neftali Watkinson, Zehra Sura, David Padua
Publikováno v:
IISWC
Although numerous loop optimization techniques have been designed and deployed in commercial compilers in the past, virtually no common experimental infrastructure nor repository exists to help the compiler community evaluate the effectiveness of the