TOAST: Automatic tiling for iterative stencil computations on GPUs

Autor:	Alyson D. Pereira, Rodrigo C. O. Rocha, Luiz Ramos, Luís F. W. Góes
Rok vydání:	2017
Předmět:	020203 distributed computing Computer Networks and Communications Stencil code Computer science Locality Process (computing) Volume (computing) 02 engineering and technology Parallel computing Solver Stencil Computer Science Applications Theoretical Computer Science Computational Theory and Mathematics 0202 electrical engineering electronic engineering information engineering Overhead (computing) 020201 artificial intelligence & image processing Central processing unit Software ComputingMethodologies_COMPUTERGRAPHICS
Zdroj:	Concurrency and Computation: Practice and Experience. 29:e4053
ISSN:	1532-0626
DOI:	10.1002/cpe.4053
Popis:	The stencil pattern is important in many scientific and engineering domains, spurring great interest from researchers and industry. In recent years, various optimizations have been proposed for parallel stencil applications running on graphics processing units GPUs. In particular, tiling is a technique that can significantly enhance application performance by improving data locality and by reducing the volume of communication between host memory and GPU. In addition, tiling enables stencil applications to process inputs that are larger than the physical GPU memory. However, implementing tiling efficiently is complex, time-consuming, and error-prone. In this paper, we propose transparently optimized automatic stencil tiling TOAST, an automatic tiling mechanism for iterative stencil computations running on GPUs; TOAST has 3 main benefits: 1 It incorporates an optimization model that seeks to maximize data reuse within tiles while respecting the amount of dynamically available GPU memory; 2 it offers a virtualized GPU memory for stencil computations, allowing for large input data; and 3 it performs optimal tiling transparently to the developer of the parallel stencil application. The current implementation of TOAST augments the PSkel framework with an internal solver based on genetic algorithms. Our experimental results show that TOAST improves the performance of iterative stencil applications by up to 13i¾?×i¾? compared with their multithreaded central processing unit-based optimized versions and up to 48i¾?×i¾? compared with a naive tiling approach on GPU. The TOAST mechanism is able to automatically achieve a low percentual overhead of data management compared with actual stencil computation.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::8b658eff7afa836603ef34626051b9c2 https://doi.org/10.1002/cpe.4053 Zobrazit plný text záznamu Plný text