Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms

Autor: Rob V. van Nieuwpoort, Stijn Heldens, Pieter Hijma, Ben van Werkhoven, Henri E. Bal, Jason Maassen
Přispěvatelé: IvI Research (FNWI), Multiscale Networked Systems (IvI, FNWI), Computer Systems, Network Institute, High Performance Distributed Computing, Mathematics
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: Proceedings of SC20: The International Conference for High Performance Computing, Networking, Storage and Analysis: virtual event, November 9-19, 2020
Proceedings of SC20: The International Conference for High Performance Computing, Networking, Storage and Analysis
SC20: International Conference for High Performance Computing, Networking, Storage and Analysis
Heldens, S, Hijma, H P, van Werkhoven, B, Maassen, J, Bal, H & van Nieuwpoort, R 2021, Rocket: efficient and scalable all-pairs computations on heterogeneous platforms . in SC '20 : Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis . IEEE . https://doi.org/10.48550/arXiv.2009.04755, https://doi.org/10.1109/SC41405.2020.00105
SC20: International Conference for High Performance Computing, Networking, Storage and Analysis: [Proceedings]
SC
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
SC '20
Heldens, S, Hijma, P, Werkhoven, B V, Maassen, J, Bal, H & Nieuwpoort, R V 2021, Rocket: Efficient and scalable all-pairs computations on heterogeneous platforms . in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis : [Proceedings] ., 9355286, International Conference for High Performance Computing, Networking, Storage and Analysis, SC, vol. 2020-November, IEEE Computer Society, 2020 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual, Atlanta, United States, 9/11/20 . https://doi.org/10.1109/SC41405.2020.00105
Popis: All-pairs compute problems apply a user-defined function to each combination of two items of a given data set. Although these problems present an abundance of parallelism, data reuse must be exploited to achieve good performance. Several researchers considered this problem, either resorting to partial replication with static work distribution or dynamic scheduling with full replication. In contrast, we present a solution that relies on hierarchical multi-level software-based caches to maximize data reuse at each level in the distributed memory hierarchy, combined with a divide-and-conquer approach to exploit data locality, hierarchical work-stealing to dynamically balance the workload, and asynchronous processing to maximize resource utilization. We evaluate our solution using three real-world applications, from digital forensics, localization microscopy, and bioinformatics, on different platforms, from desktop machine to a supercomputer. Results shows excellent efficiency and scalability when scaling to 96 GPUs, even obtaining super-linear speedups due to a distributed cache.
Databáze: OpenAIRE