A GPU-based algorithm for efficient LES of high Reynolds number flows in heterogeneous CPU/GPU supercomputers
Autor: | Georgios A. Leftheriotis, Athanassios A. Dimas, Guillermo Oyarzun, Iason A. Chalmoukis |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
Applied Mathematics 02 engineering and technology Solver Grid Supercomputer 01 natural sciences 020303 mechanical engineering & transports 0203 mechanical engineering Modeling and Simulation 0103 physical sciences Computer Science::Mathematical Software Code (cryptography) Periodic boundary conditions Overhead (computing) Central processing unit Poisson's equation 010301 acoustics Algorithm |
Zdroj: | Applied Mathematical Modelling. 85:141-156 |
ISSN: | 0307-904X |
DOI: | 10.1016/j.apm.2020.04.010 |
Popis: | Αn optimized MPI+OpenACC implementation model that performs efficiently in CPU/GPU systems using large-eddy simulation is presented. The code was validated for the simulation of wave boundary-layer flows against numerical and experimental data in the literature. A direct Fast-Fourier-Transform-based solver was developed for the solution of the Poisson equation for pressure taking advantage of the periodic boundary conditions. This solver was optimized for parallel execution in CPUs and outperforms by 10 times in computational time a typical iterative preconditioned conjugate gradient solver in GPUs. In terms of parallel performance, an overlapping strategy was developed to reduce the overhead of performing MPI communications using GPUs. As a result, the weak scaling of the algorithm was improved up to 30%. Finally, a large-scale simulation (Re = 2 × 105) using a grid of 4 × 108 cells was executed, and the performance of the code was analyzed. The simulation was launched using up to 512 nodes (512 GPUs + 6144 CPU-cores) on one of the current top 10 supercomputers of the world (Piz Daint). A comparison of the overall computational time showed that the GPU version was 4.2 times faster than the CPU one. The parallel efficiency of this strategy (47%) is competitive compared with the state-of-the-art CPU implementations, and it has the potential to take advantage of modern supercomputing capabilities. |
Databáze: | OpenAIRE |
Externí odkaz: |