Reordering Strategy for Blocking Optimization in Sparse Linear Solvers
Autor: | Jean Roman, Grégoire Pichon, Mathieu Faverge, Pierre Ramet |
---|---|
Přispěvatelé: | High-End Parallel Algorithms for Challenging Numerical Simulations (HiePACS), Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Plafrim, ANR-13-MONU-0007,SOLHAR,Solveurs pour architectures hétérogènes utilisant des supports d'exécution(2013), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Inria Bordeaux - Sud-Ouest |
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
Theoretical computer science
65F05 Computation sparse matrix ordering 010103 numerical & computational mathematics Parallel computing 01 natural sciences CUDA Kernel (linear algebra) AMS subject classifications 05C50 65F50 heterogeneous 0101 mathematics Block (data storage) Mathematics sparse block linear solver Nested dissection Linear system architectures 68Q25 010101 applied mathematics Linear algebra nested dissection Granularity [INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC] Analysis |
Zdroj: | SIAM Journal on Matrix Analysis and Applications SIAM Journal on Matrix Analysis and Applications, 2017, SIAM Journal on Matrix Analysis and Applications, 38 (1), pp.226-248. ⟨10.1137/16M1062454⟩ SIAM Journal on Matrix Analysis and Applications, Society for Industrial and Applied Mathematics, 2017, SIAM Journal on Matrix Analysis and Applications, 38 (1), pp.226-248. ⟨10.1137/16M1062454⟩ |
ISSN: | 0895-4798 1095-7162 |
Popis: | International audience; Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time-consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For this reason, optimizing their performance on modern architectures is critical. The preprocessing steps of sparse direct solvers—ordering and block-symbolic factorization—are two major steps that lead to a reduced amount of computation and memory and to a better task granularity to reach a good level of performance when using BLAS kernels. With the advent of GPUs, the granularity of the block computation has become more important than ever. In this paper, we present a reordering strategy that increases this block granularity. This strategy relies on block-symbolic factorization to refine the ordering produced by tools such as Metis or Scotch, but it does not impact the number of operations required to solve the problem. We integrate this algorithm in the PaStiX solver and show an important reduction of the number of off-diagonal blocks on a large spectrum of matrices. This improvement leads to an increase in efficiency of up to 20% on GPUs. 1. Introduction. Many scientific applications, such as electromagnetism, astrophysics , and computational fluid dynamics, use numerical models that require solving linear systems of the form Ax = b. In those problems, the matrix A can be considered as either dense (almost no zero entries) or sparse (mostly zero entries). Due to multiple structural and numerical differences that appear in those problems, many different solutions exist to solve them. In this paper, we focus on problems leading to sparse systems with a symmetric pattern and, more specifically, on direct methods which factorize the matrix A in LL t , LDL t , or LU , with L, D, and U, respectively, unit lower triangular, diagonal, and upper triangular according to the problem numerical properties. Those sparse matrices appear mostly when discretizing partial differential equations (PDEs) on two-(2D) and three-(3D) dimensional finite element or finite volume meshes. The main issue with such factorizations is the fill-in—zero entries becoming nonzero—that appears in the factorized form of A during the execution of the algorithm. If not correctly considered, the fill-in can transform the sparse matrix into a dense one which might not fit in memory. In this context, sparse direct solvers rely on two important preprocessing steps to reduce this fill-in and control where it appears. The first one finds a suitable ordering of the unknowns that aims at minimizing the fill-in to limit the memory overhead and floating point operations (Flops) required to complete the factorization. The problem is then transformed into (P AP t)(P x) = P b |
Databáze: | OpenAIRE |
Externí odkaz: |