Zobrazeno 1 - 10
of 16
pro vyhledávání: '"Weikang Qiao"'
Publikováno v:
Communications of the ACM. Jan2023, Vol. 66 Issue 1, p74-85. 12p. 1 Color Photograph, 4 Diagrams, 11 Charts, 1 Graph.
Autor:
Licheng Guo, Pongstorn Maidee, Yun Zhou, Chris Lavin, Eddie Hung, Wuxi Li, Jason Lau, Weikang Qiao, Yuze Chi, Linghao Song, Yuanlong Xiao, Alireza Kaviani, Zhiru Zhang, Jason Cong
Publikováno v:
ACM Transactions on Reconfigurable Technology and Systems.
FPGAs require a much longer compilation cycle than conventional computing platforms like CPUs. In this paper, we shorten the overall compilation time by co-optimizing the HLS compilation (C-to-RTL) and the back-end physical implementation (RTL-to-bit
In the past few years, domain-specific accelerators (DSAs), such as Google's Tensor Processing Units, have shown to offer significant performance and energy efficiency over general-purpose CPUs. An important question is whether typical software devel
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::75cc995366ff32ba3b31df6d3d6b4a3d
http://arxiv.org/abs/2209.02951
http://arxiv.org/abs/2209.02951
The emergence of high-bandwidth memory (HBM) brings new opportunities to boost the performance of sorting acceleration on FPGAs, which was conventionally bounded by the available off-chip memory bandwidth. However, it is nontrivial for designers to f
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::066ffbf9a970c0f4695d63ecee68dd83
http://arxiv.org/abs/2205.07991
http://arxiv.org/abs/2205.07991
Autor:
Licheng Guo, Pongstorn Maidee, Yun Zhou, Chris Lavin, Jie Wang, Yuze Chi, Weikang Qiao, Alireza Kaviani, Zhiru Zhang, Jason Cong
Publikováno v:
FPGA '22 : proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
FPGAs require a much longer compilation cycle than conventional computing platforms like CPUs. In this paper, we shorten the overall compilation time by co-optimizing the HLS compilation (C-to-RTL) and the back-end physical implementation (RTL-to-bit
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::c53a2ab5e75d3bc0862cc270c952fb41
https://biblio.ugent.be/publication/8742525
https://biblio.ugent.be/publication/8742525
Autor:
Licheng Guo, Zhenman Fang, Zhiru Zhang, Jason Cong, Yuze Chi, Jason Lau, Linghao Song, Xingyu Tian, Moazin Khatti, Weikang Qiao, Jie Wang, Ecenur Ustun
Publikováno v:
Web of Science
In this paper, we propose TAPA, an end-to-end framework that compiles a C++ task-parallel dataflow program into a high-frequency FPGA accelerator. Compared to existing solutions, TAPA has two major advantages. First, TAPA provides a set of convenient
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3b600742f18716ec6675d992c2d33add
Publikováno v:
FCCM
Large-scale sorting is always an important yet demanding task for data center applications. In addition to powerful processing capability, high-performance sorting system requires efficient utilization of the available bandwidth of various levels in
Publikováno v:
FPGA
With the recent release of High Bandwidth Memory (HBM) based FPGA boards, developers can now exploit unprecedented external memory bandwidth. This allows more memory-bounded applications to benefit from FPGA acceleration. However, fully utilizing the
Autor:
Jie Wang, Jason Cong, Zhiru Zhang, Yuze Chi, Jason Lau, Licheng Guo, Weikang Qiao, Ecenur Ustun
Publikováno v:
FPGA
Despite an increasing adoption of high-level synthesis (HLS) for its design productivity advantages, there remains a significant gap in the achievable clock frequency between an HLS-generated design and a handcrafted RTL one. A key factor that limits
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3ff1a3952aee4f8a74022b6967f1a740
https://europepmc.org/articles/PMC8041363/
https://europepmc.org/articles/PMC8041363/
Publikováno v:
ISCA
Sorting is a key computational kernel in many big data applications. Most sorting implementations focus on a specific input size, record width, and hardware configuration. This has created a wide array of sorters that are optimized only to a narrow a