Almost Optimal Column-wise Prefix-sum Computation on the GPU
Autor: | Yasuaki Ito, Koji Nakano, Toru Fujita, Hiroki Tokura |
---|---|
Rok vydání: | 2018 |
Předmět: |
020203 distributed computing
Computer science Computation Parallel algorithm 020207 software engineering Image processing 02 engineering and technology Parallel computing Summed area table CUDA Matrix (mathematics) Titan (supercomputer) 0202 electrical engineering electronic engineering information engineering Prefix sum Computer Science::Formal Languages and Automata Theory |
Zdroj: | Parallel Processing and Applied Mathematics ISBN: 9783319780535 PPAM (2) |
Popis: | The row-wise and column-wise prefix-sum computation of a matrix has many applications in the area of image processing such as computation of the summed area table and the Euclidean distance map. It is known that the prefix-sums of a 1-dimensional array can be computed efficiently on the GPU. Hence, the row-wise prefix-sums of a matrix can also be computed efficiently on the GPU by executing this prefix-sum algorithm for every row in parallel. However, the same approach does not work well for computing the column-wise prefix-sums, because inefficient stride memory access to the global memory is performed. The main contribution of this paper is to present an almost optimal column-wise prefix-sum algorithm on the GPU. Since all elements in an input matrix must be read and the resulting prefix-sums must be written, computation of the column-wise prefix-sums cannot be faster than simple matrix duplication in the global memory of the GPU. Quite surprisingly, experimental results using NVIDIA TITAN X show that our column-wise prefix-sum algorithm runs only 2–6% slower than matrix duplication. Thus, our column-wise prefix-sum algorithm is almost optimal. |
Databáze: | OpenAIRE |
Externí odkaz: |