Popis: |
As an essential part of current mainstream computing systems, GPUs are not only powerful graphics engines but also highly parallel programmable processors. Collaboration between CPUs and GPUs is required to obtain high computing performance in multi-CPU and multi-GPU heterogeneous systems. It is challenging to develop new parallel algorithms on heterogeneous architectures with multiple CPUs and multiple GPUs for such purposes as communication, load balancing, memory spaces, and synchronization We present a parallel Cholesky block factorization algorithm for heterogeneous multi-CPU and multi-GPU architectures. First, a matrix is partitioned into different-sized blocks based on with the performance of the CPU and GPU. Then, a one-dimensional row block-cyclic distribution strategy is used to allocate row block data to every CPU and GPU to minimize communication. The computing task related to the definite row block will then be executed by the corresponding CPU or GPU. Experiments on a system with two CPUs and eight GPUs show good load balancing, parallelism, communication cost, and scalability. |