Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism

Autor:	Dana Jacobsen, Inanc Senocak
Rok vydání:	2011
Předmět:	CUDA Computer science Data parallelism Node (networking) Scalability Parallelism (grammar) Task parallelism GPU cluster Parallel computing Software_PROGRAMMINGTECHNIQUES Supercomputer ComputingMethodologies_COMPUTERGRAPHICS Computational science
Zdroj:	49th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition.
DOI:	10.2514/6.2011-947
Popis:	High performance computing using graphics processing units (GPUs) is gaining popularity in the scientific computing field, with many large compute clusters being augmented with multiple GPUs in each node. We investigate hybrid tri-level (MPI-OpenMP-CUDA) parallel implementations to explore the efficiency and scalability of incompressible flow computations on GPU clusters up to 128 GPUS. This work details some of the unique issues faced when merging fine-grain parallelism on the GPU using CUDA with coarse-grain parallelism using OpenMP for intra-node and MPI for inter-node communication. Comparisons between the tri-level MPI-OpenMP-CUDA and dual-level MPI-CUDA implementations are shown using computationally large computational fluid dynamics (CFD) simulations. Our results demonstrate that a tri-level parallel implementation does not provide a significant advantage in performance over the dual-level implementation, however further research is needed to justify our conclusion for a cluster with a high GPU per node density or when using software that can utilize OpenMP’s fine-grain parallelism more effectively.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::c64cc4d3e66959b8f0759342e467bdb4 https://doi.org/10.2514/6.2011-947 Zobrazit plný text záznamu