Popis: |
A Finite Difference code for the weakly compressible Navier-Stokes equations has been developed. The code was then ported to the graphical processing unit (GPU) using the automatic FORTRAN to CUDA translator F2CUDA. Detailed analysis revealed that the original, ‘chunky’ single loop over the points resulted in an excessive number of registers that the GPU could not handle. The RHS loop was then split according to dimensions, and fluxes were computed ‘on the fly’ in order to minimize the number of registers. The final code, although not as transparent and tidy as the original, led to the expected performance on the GPU. The timing studies carried out revealed that at present the performance on both CPU and GPU hardware is dominated by memory transfer rates. Without accounting for any floating point operations (FLOPS), the theoretically achievable speeds based on the memory transfer hardware ratings of both the CPU and the GPU are within a factor of 1.5 of the timings obtained. |