Zobrazeno 1 - 10
of 106
pro vyhledávání: '"Fang, Jianbin"'
Programming Bare-Metal Accelerators with Heterogeneous Threading Models: A Case Study of Matrix-3000
Publikováno v:
Frontiers of Information Technology & Electronic Engineering, 2022
As the hardware industry moves towards using specialized heterogeneous many-cores to avoid the effects of the power wall, software developers are finding it hard to deal with the complexity of these systems. This article shares our experience when de
Externí odkaz:
http://arxiv.org/abs/2210.12230
Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only b
Externí odkaz:
http://arxiv.org/abs/2005.04094
This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a performance model
Externí odkaz:
http://arxiv.org/abs/2003.04294
Understanding the scalability of parallel programs is crucial for software optimization and hardware architecture design. As HPC hardware is moving towards many-core design, it becomes increasingly difficult for a parallel program to make effective u
Externí odkaz:
http://arxiv.org/abs/1911.08779
Autor:
Qin, Qing, Ren, Jie, Yu, Jialong, Gao, Ling, Wang, Hai, Zheng, Jie, Feng, Yansong, Fang, Jianbin, Wang, Zheng
The recent advances in deep neural networks (DNNs) make them attractive for embedded systems. However, it can take a long time for DNNs to make an inference on resource-constrained computing devices. Model compression techniques can address the compu
Externí odkaz:
http://arxiv.org/abs/1810.08899
Sparse matrix vector multiplication (SpMV) is one of the most common operations in scientific and high-performance applications, and is often responsible for the application performance bottleneck. While the sparse matrix representation has a signifi
Externí odkaz:
http://arxiv.org/abs/1805.11938
Many-core accelerators, as represented by the XeonPhi coprocessors and GPGPUs, allow software to exploit spatial and temporal sharing of computing resources to improve the overall system performance. To unlock this performance potential requires soft
Externí odkaz:
http://arxiv.org/abs/1802.02760
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Currently, very few cases have been streamed to demonstrate the streaming performance impact and a systematic investi
Externí odkaz:
http://arxiv.org/abs/1608.03044