Accelerated FDPS --- Algorithms to Use Accelerators with FDPS
Autor: | Junichiro Makino, Keigo Nitadori, Daisuke Namekata, Masaki Iwasawa, Kentaro Nomura, Long Wang, Miyuki Tsubouchi |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Physics
Galaxy: evolution Interface (computing) Bandwidth (signal processing) Parallel algorithm Degree of parallelism Performance tuning FOS: Physical sciences Astronomy and Astrophysics Data structure 01 natural sciences 010305 fluids & plasmas methods: numerical (cosmology:) dark matter Space and Planetary Science 0103 physical sciences planets and satellites: formation Central processing unit General-purpose computing on graphics processing units Astrophysics - Instrumentation and Methods for Astrophysics Instrumentation and Methods for Astrophysics (astro-ph.IM) 010303 astronomy & astrophysics Algorithm |
Popis: | We describe algorithms implemented in FDPS (Framework for Developing Particle Simulators) to make efficient use of accelerator hardware such as GPGPUs (general-purpose computing on graphics processing units). We have developed FDPS to make it possible for researchers to develop their own high-performance parallel particle-based simulation programs without spending large amounts of time on parallelization and performance tuning. FDPS provides a high-performance implementation of parallel algorithms for particle-based simulations in a “generic” form, so that researchers can define their own particle data structure and interparticle interaction functions. FDPS compiled with user-supplied data types and interaction functions provides all the necessary functions for parallelization, and researchers can thus write their programs as though they are writing simple non-parallel code. It has previously been possible to use accelerators with FDPS by writing an interaction function that uses the accelerator. However, the efficiency was limited by the latency and bandwidth of communication between the CPU and the accelerator, and also by the mismatch between the available degree of parallelism of the interaction function and that of the hardware parallelism. We have modified the interface of the user-provided interaction functions so that accelerators are more efficiently used. We also implemented new techniques which reduce the amount of work on the CPU side and the amount of communication between CPU and accelerators. We have measured the performance of N-body simulations on a system with an NVIDIA Volta GPGPU using FDPS and the achieved performance is around 27% of the theoretical peak limit. We have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth. Thus, our implementation will be applicable to future generations of accelerator system. |
Databáze: | OpenAIRE |
Externí odkaz: |