Popis: |
This bachelor thesis presents a method to parallelise the Lattice Boltzmann method on several graphics processing units by coupling it with the Message Passing Interface (MPI). This task is mainly related to the limited on-board memory of a single graphics processing unit and the memory intensity of the Lattice Boltzmann method. A concrete algorithm for the simulation software FluidX3D is shown and validated. This has the flexibility to work with different extensions of the Lattice Boltzmann method: besides complex geometric boundary conditions, heat flux and condensation processes, the simulation of free surfaces is also possible. A special challenge is to combine the function-oriented MPI communication with the object-oriented approach of FluidX3D. This thesis wil explain various optimizations of the multi-GPU extension: on the one hand, they rely on knowledge about the programming language OpenCL and the hardware of GPUs, on the other hand, the algorithm itself is extended in such a way that an overlap of calculation and memory transfer can take place. The optimizations will be confirmed by runtime measurements on two different clusters with up to 4 GPUs at the same time. The multi-GPU algorithm reaches 95% of its theoretical optimum in weak-scaling almost independent of the number of GPUs used. In strong-scaling the efficiency of 4 GPUs is 77%. Up to 13600 MLUPs when using 4 Radeon VII GPUs were achieved for a cubic benchmark setup. |