623 Tflop/s HPCG run on Tianhe-2: Leveraging millions of hybrid cores.

Autor: Liu, Yiqun, Yang, Chao, Liu, Fangfang, Zhang, Xianyi, Lu, Yutong, Du, Yunfei, Yang, Canqun, Xie, Min, Liao, Xiangke
Předmět:
Zdroj: International Journal of High Performance Computing Applications; Spring2016, Vol. 30 Issue 1, p39-54, 16p
Abstrakt: In this article, we present a new hybrid algorithm to enable and scale the high-performance conjugate gradients (HPCG) benchmark on large-scale heterogeneous systems such as the Tianhe-2. Based on an inner–outer subdomain partitioning strategy, the data distribution between host and device can be balanced adaptively. The overhead of data movement from both the MPI communication and the PCI-E transfer can be significantly reduced by carefully rearranging and fusing operations. A variety of parallelization and optimization techniques for performance-critical kernels are exploited and analyzed to maximize the performance gain on both host and device. We carry out experiments on both a small heterogeneous computer and the world’s largest one, the Tianhe-2. On the small system, a thorough comparison and analysis has been presented to select from different optimization choices. On Tianhe-2, the optimized implementation scales to the full-system level of 3.12 million heterogeneous cores, with an aggregated performance of 623 Tflop/s and a parallel efficiency of 81.2%. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index