Algorithm 1019 : A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation

Autor:	Mirko Myllykoski
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	FOS: Computer and information sciences real Schur form distributed memory Beräkningsmatematik MathematicsofComputing_NUMERICALANALYSIS GPU 97N80 15A18 65F15 65Y05 68W10 68W15 multi-shift StarPU G.1.3 010103 numerical & computational mathematics 02 engineering and technology QZ algorithm 01 natural sciences shared memory Eigenvalue problem FOS: Mathematics 0202 electrical engineering electronic engineering information engineering Mathematics - Numerical Analysis 0101 mathematics task-based aggressive early deflation QR algorithm Computer Sciences Applied Mathematics Numerical Analysis (math.NA) Computational Mathematics Datavetenskap (datalogi) Computer Science - Distributed Parallel and Cluster Computing F.1.2 F.2.1 Computer Science - Mathematical Software 020201 artificial intelligence & image processing MPI Distributed Parallel and Cluster Computing (cs.DC) Mathematical Software (cs.MS) Software
Popis:	The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real Schur form. The task-based algorithm also supports generalized eigenvalue problems (QZ algorithm) but this paper concentrates on the standard case. The task-based algorithm adopts previous algorithmic improvements, such as tightly-coupled multi-shifts and Aggressive Early Deflation (AED), and also incorporates several new ideas that significantly improve the performance. This includes, but is not limited to, the elimination of several synchronization points, the dynamic merging of previously separate computational steps, the shortening and the prioritization of the critical path, and experimental GPU support. The task-based implementation is demonstrated to be multiple times faster than multi-threaded LAPACK and ScaLAPACK in both single-node and multi-node configurations on two different machines based on Intel and AMD CPUs. The implementation is built on top of the StarPU runtime system and is part of the open-source StarNEig library. Comment: 36 pages, 20 figures, 9 tables. Peer-reviewed version
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::40ba854aad56a28113aa4b9ed85143b1 http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-190558 Zobrazit plný text záznamu