Algorithm 1019 : A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation

Autor: Mirko Myllykoski
Jazyk: angličtina
Rok vydání: 2022
Předmět:
FOS: Computer and information sciences
real Schur form
distributed memory
Beräkningsmatematik
MathematicsofComputing_NUMERICALANALYSIS
GPU
97N80
15A18
65F15
65Y05
68W10
68W15

multi-shift
StarPU
G.1.3
010103 numerical & computational mathematics
02 engineering and technology
QZ algorithm
01 natural sciences
shared memory
Eigenvalue problem
FOS: Mathematics
0202 electrical engineering
electronic engineering
information engineering

Mathematics - Numerical Analysis
0101 mathematics
task-based
aggressive early deflation
QR algorithm
Computer Sciences
Applied Mathematics
Numerical Analysis (math.NA)
Computational Mathematics
Datavetenskap (datalogi)
Computer Science - Distributed
Parallel
and Cluster Computing

F.1.2
F.2.1
Computer Science - Mathematical Software
020201 artificial intelligence & image processing
MPI
Distributed
Parallel
and Cluster Computing (cs.DC)

Mathematical Software (cs.MS)
Software
Popis: The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real Schur form. The task-based algorithm also supports generalized eigenvalue problems (QZ algorithm) but this paper concentrates on the standard case. The task-based algorithm adopts previous algorithmic improvements, such as tightly-coupled multi-shifts and Aggressive Early Deflation (AED), and also incorporates several new ideas that significantly improve the performance. This includes, but is not limited to, the elimination of several synchronization points, the dynamic merging of previously separate computational steps, the shortening and the prioritization of the critical path, and experimental GPU support. The task-based implementation is demonstrated to be multiple times faster than multi-threaded LAPACK and ScaLAPACK in both single-node and multi-node configurations on two different machines based on Intel and AMD CPUs. The implementation is built on top of the StarPU runtime system and is part of the open-source StarNEig library.
Comment: 36 pages, 20 figures, 9 tables. Peer-reviewed version
Databáze: OpenAIRE