Zobrazeno 1 - 10
of 11
pro vyhledávání: '"I-Jui Sung"'
Autor:
I-Jui Sung, 宋宜叡
92
Effective utilization of on-chip memories has always been an important factor to improve the performance of a program. Caching techniques has been studied for decades, whilst recently scratch-pad memory is getting more and more focuses on it,
Effective utilization of on-chip memories has always been an important factor to improve the performance of a program. Caching techniques has been studied for decades, whilst recently scratch-pad memory is getting more and more focuses on it,
Externí odkaz:
http://ndltd.ncl.edu.tw/handle/64563300131731962703
Autor:
Wen-mei W. Hwu, José María González-Linares, Nicolás Guil, Juan Gómez-Luna, Li-Wen Chang, I-Jui Sung
Publikováno v:
IEEE Transactions on Parallel and Distributed Systems. 27:776-788
Matrix transposition is an important algorithmic building block for many numeric algorithms such as FFT. With more and more algebra libraries offloading to GPUs, a high performance in-place transposition becomes necessary. Intuitively, in-place trans
Publikováno v:
PPOPP
Matrix transposition is an important algorithmic building block for many numeric algorithms such as FFT. It has also been used to convert the storage layout of arrays. With more and more algebra libraries offloaded to GPUs, a high performance in-plac
Autor:
Wen-mei W. Hwu, Nasser Anssari, Geng (Daniel) Liu, John A. Stratton, Nady Obeid, Christopher I. Rodrigues, Li-Wen Chang, I-Jui Sung
Publikováno v:
Computer. 45:26-32
A study of the implementation patterns among massively threaded applications for many-core GPUs reveals that each of the seven most commonly used algorithm and data optimization techniques can enhance the performance of applicable kernels by 2 to 10
Publikováno v:
International Journal of Parallel Programming. 40:4-24
We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid
Publikováno v:
ICPP
In-place data manipulation is very desirable in many-core architectures with limited on-board memory. This paper deals with the in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Da
While OpenCL was originally designed as an application programming interface (API) for human developers, it can also serve as an implementation platform for higher-level object-oriented programming languages such as C++. Targeting OpenCL rather than
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::28088a97f1504df80e363372d068c490
https://doi.org/10.1016/b978-0-12-801414-1.00011-9
https://doi.org/10.1016/b978-0-12-801414-1.00011-9
Autor:
Nady Obeid, Nasser Anssari, Li-Wen Chang, John A. Stratton, I-Jui Sung, Christopher I. Rodrigues, Geng Daniel Liu, Wen-mei W. Hwu
Publikováno v:
2012 Innovative Parallel Computing (InPar).
It is unquestionable that successive hardware generations have significantly improved GPU computing workload performance over the last several years. Moore's law and DRAM scaling have respectively increased single-chip peak instruction throughput by
Publikováno v:
2012 Innovative Parallel Computing (InPar).
For many-core architectures like the GPUs, efficient off-chip memory access is crucial to high performance; the applications are often limited by off-chip memory bandwidth. Transforming data layout is an effective way to reshape the access patterns t
Publikováno v:
PACT
We present automatic data layout transformation as an effective compiler performance optimization for memory-bound structured grid applications. Structured grid applications include stencil codes and other code structures using a dense, regular grid