Solving dense symmetric indefinite systems using GPUs

Autor:	Adrien Rémy, Ichitaro Yamazaki, Jack Dongarra, Marc Baboulin, Stanimire Tomov
Přispěvatelé:	Systèmes parallèles (LRI) (ParSys - LRI), Laboratoire de Recherche en Informatique (LRI), Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS)-Université Paris-Sud - Paris 11 (UP11)-CentraleSupélec-Centre National de la Recherche Scientifique (CNRS), The University of Tennessee [Knoxville]
Rok vydání:	2017
Předmět:	symmetric pivoting Computer Networks and Communications Computer science Graphics processing unit 010103 numerical & computational mathematics 02 engineering and technology Parallel computing randomization System of linear equations 01 natural sciences Single-precision floating-point format Theoretical Computer Science Matrix (mathematics) Factorization Iterative refinement iterative refinement 0202 electrical engineering electronic engineering information engineering 0101 mathematics dense symmetric indefinite systems Multi-core processor communication- avoiding [INFO.INFO-NA]Computer Science [cs]/Numerical Analysis [cs.NA] Solver communication-avoiding Computer Science Applications GPU computation Computational Theory and Mathematics 020201 artificial intelligence & image processing Central processing unit [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] Software Numerical stability
Zdroj:	Concurrency and Computation: Practice and Experience Concurrency and Computation: Practice and Experience, Wiley, 2017, 29 (9), pp.1-17. ⟨10.1002/cpe.4055⟩
ISSN:	1532-0634 1532-0626
DOI:	10.1002/cpe.4055
Popis:	Summary This paper studies the performance of different algorithms for solving a dense symmetric indefinite linear system of equations on multicore CPUs with a Graphics Processing Unit (GPU). To ensure the numerical stability of the factorization, pivoting is required. Obtaining high performance of such algorithms on the GPU is difficult because all the existing pivoting strategies lead to frequent synchronizations and irregular data accesses. Until recently, there has not been any implementation of these algorithms on a hybrid CPU/GPU architecture. To improve their performance on the hybrid architecture, we explore different techniques to reduce the expensive data transfer and synchronization between the CPU and GPU, or on the GPU (e.g., factorizing the matrix entirely on the GPU or in a communication-avoiding fashion). We also study the performance of the solver using iterative refinements along with the factorization without pivoting combined with the preprocessing technique based on random butterfly transformations, or with the mixed-precision algorithm where the matrix is factorized in single precision. This randomization algorithm only has a probabilistic proof on the numerical stability, and for this paper, we only focused on the mixed-precision algorithm without pivoting. However, they demonstrate that we can obtain good performance on the GPU by avoiding the pivoting and using the lower precision arithmetics, respectively. As illustrated with the application in acoustics studied in this paper, in many practical cases, the matrices can be factorized without pivoting. Because the componentwise backward error computed in the iterative refinement signals when the algorithm failed to obtain the desired accuracy, the user can use these potentially unstable but efficient algorithms in most of the cases and fall back to a more stable algorithm with pivoting only in the case of the failure. Copyright © 2017 John Wiley & Sons, Ltd.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::10a2e4b180dd6b9a4b281f7abd485ce6 https://doi.org/10.1002/cpe.4055 Zobrazit plný text záznamu Plný text