Improving the performance of classical linear algebra iterative methods via hybrid parallelism

Autor: Pedro J. Martinez-Ferrer, Tufan Arslan, Vicenç Beltran
Přispěvatelé: Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. PM - Programming Models
Rok vydání: 2023
Předmět:
FOS: Computer and information sciences
Algebras
Linear

Computer Networks and Communications
G.1.3
Distributed-memory
Theoretical Computer Science
Shared-memory
Artificial Intelligence
Computer Science - Data Structures and Algorithms
Data Structures and Algorithms (cs.DS)
Linear algebra
Informàtica::Arquitectura de computadors::Arquitectures paral·leles [Àrees temàtiques de la UPC]
15-04
Computer Science - Performance
Parallel processing (Electronic computers)
Processament en paral·lel (Ordinadors)
I.6.3
Hybrid parallelism
Informàtica::Informàtica teòrica::Algorísmica i teoria de la complexitat [Àrees temàtiques de la UPC]
Performance (cs.PF)
Computer Science - Distributed
Parallel
and Cluster Computing

Hardware and Architecture
MPI
Distributed
Parallel
and Cluster Computing (cs.DC)

Àlgebra lineal
Software
DOI: 10.48550/arxiv.2305.05988
Popis: We propose fork-join and task-based hybrid implementations of four classical linear algebra iterative methods (Jacobi, Gauss-Seidel, conjugate gradient and biconjugate gradient stabilised) as well as variations of them. Algorithms are duly documented and the corresponding source code is made publicly available for reproducibility. Both weak and strong scalability benchmarks are conducted to statistically analyse their relative efficiencies. The weak scalability results assert the superiority of a task-based hybrid parallelisation over MPI-only and fork-join hybrid implementations. Indeed, the task-based model is able to achieve speedups of up to 25% larger than its MPI-only counterpart depending on the numerical method and the computational resources used. For strong scalability scenarios, hybrid methods based on tasks remain more efficient with moderate computational resources where data locality does not play an important role. Fork-join hybridisation often yields mixed results and hence does not present a competitive advantage over a much simpler MPI approach.
Comment: 33 pages, 6 figures, accepted manuscript in Journal of Parallel and Distributed Computing
Databáze: OpenAIRE