On the Achievable Speeds of Finite Difference Solvers on CPUs and GPUs

Autor:	Wolfgang A. Wall, Rainald Löhner, Andrew T. Corrigan, Karl-Robert Wichmann
Rok vydání:	2013
Předmět:	CUDA Floating point Fortran Computer science Computer Science::Mathematical Software Code (cryptography) Finite difference Parallel computing Central processing unit FLOPS computer Porting computer.programming_language
Zdroj:	21st AIAA Computational Fluid Dynamics Conference.
DOI:	10.2514/6.2013-2852
Popis:	A Finite Difference code for the weakly compressible Navier-Stokes equations has been developed. The code was then ported to the graphical processing unit (GPU) using the automatic FORTRAN to CUDA translator F2CUDA. Detailed analysis revealed that the original, ‘chunky’ single loop over the points resulted in an excessive number of registers that the GPU could not handle. The RHS loop was then split according to dimensions, and fluxes were computed ‘on the fly’ in order to minimize the number of registers. The final code, although not as transparent and tidy as the original, led to the expected performance on the GPU. The timing studies carried out revealed that at present the performance on both CPU and GPU hardware is dominated by memory transfer rates. Without accounting for any floating point operations (FLOPS), the theoretically achievable speeds based on the memory transfer hardware ratings of both the CPU and the GPU are within a factor of 1.5 of the timings obtained.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::3daeecf174b6f3cd36beb64d9a532070 https://doi.org/10.2514/6.2013-2852 Zobrazit plný text záznamu