Almost Optimal Column-wise Prefix-sum Computation on the GPU

Autor: Yasuaki Ito, Koji Nakano, Toru Fujita, Hiroki Tokura
Rok vydání: 2018
Předmět:
Zdroj: Parallel Processing and Applied Mathematics ISBN: 9783319780535
PPAM (2)
Popis: The row-wise and column-wise prefix-sum computation of a matrix has many applications in the area of image processing such as computation of the summed area table and the Euclidean distance map. It is known that the prefix-sums of a 1-dimensional array can be computed efficiently on the GPU. Hence, the row-wise prefix-sums of a matrix can also be computed efficiently on the GPU by executing this prefix-sum algorithm for every row in parallel. However, the same approach does not work well for computing the column-wise prefix-sums, because inefficient stride memory access to the global memory is performed. The main contribution of this paper is to present an almost optimal column-wise prefix-sum algorithm on the GPU. Since all elements in an input matrix must be read and the resulting prefix-sums must be written, computation of the column-wise prefix-sums cannot be faster than simple matrix duplication in the global memory of the GPU. Quite surprisingly, experimental results using NVIDIA TITAN X show that our column-wise prefix-sum algorithm runs only 2–6% slower than matrix duplication. Thus, our column-wise prefix-sum algorithm is almost optimal.
Databáze: OpenAIRE