Zobrazeno 1 - 10
of 113
pro vyhledávání: '"BOMAN, ERIK G."'
On current computer architectures, GMRES' performance can be limited by its communication cost to generate orthonormal basis vectors of the Krylov subspace. To address this performance bottleneck, its $s$-step variant orthogonalizes a block of $s$ ba
Externí odkaz:
http://arxiv.org/abs/2402.15033
CholeskyQR2 and shifted CholeskyQR3 are two state-of-the-art algorithms for computing tall-and-skinny QR factorizations since they attain high performance on current computer architectures. However, to guarantee stability, for some applications, Chol
Externí odkaz:
http://arxiv.org/abs/2309.05868
The multilevel heuristic is the dominant strategy for high-quality sequential and parallel graph partitioning. Partition refinement is a key step of multilevel graph partitioning. In this work, we present Jet, a new parallel algorithm for partition r
Externí odkaz:
http://arxiv.org/abs/2304.13194
Autor:
Loe, Jennifer A., Glusa, Christian A., Yamazaki, Ichitaro, Boman, Erik G., Rajamanickam, Sivasankaran
Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require dou
Externí odkaz:
http://arxiv.org/abs/2109.01232
Graph coloring is often used in parallelizing scientific computations that run in distributed and multi-GPU environments; it identifies sets of independent data that can be updated in parallel. Many algorithms exist for graph coloring on a single GPU
Externí odkaz:
http://arxiv.org/abs/2107.00075
Autor:
Loe, Jennifer A., Glusa, Christian A., Yamazaki, Ichitaro, Boman, Erik G., Rajamanickam, Sivasankaran
Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems require dou
Externí odkaz:
http://arxiv.org/abs/2105.07544
Graph partitioning has been an important tool to partition the work among several processors to minimize the communication cost and balance the workload. While accelerator-based supercomputers are emerging to be the standard, the use of graph partiti
Externí odkaz:
http://arxiv.org/abs/2105.00578
Autor:
Bielich, Daniel, Langou, Julien, Thomas, Stephen, Swirydowicz, Kasia, Yamazaki, Ichitaro, Boman, Erik G.
The parallel strong-scaling of Krylov iterative methods is largely determined by the number of global reductions required at each iteration. The GMRES and Krylov-Schur algorithms employ the Arnoldi algorithm for nonsymmetric matrices. The underlying
Externí odkaz:
http://arxiv.org/abs/2104.01253
Autor:
Abdelfattah, Ahmad, Anzt, Hartwig, Boman, Erik G., Carson, Erin, Cojean, Terry, Dongarra, Jack, Gates, Mark, Grützmacher, Thomas, Higham, Nicholas J., Li, Sherry, Lindquist, Neil, Liu, Yang, Loe, Jennifer, Luszczek, Piotr, Nayak, Pratik, Pranesh, Sri, Rajamanickam, Siva, Ribizel, Tobias, Smith, Barry, Swirydowicz, Kasia, Thomas, Stephen, Tomov, Stanimire, Tsai, Yaohung M., Yamazaki, Ichitaro, Yang, Urike Meier
Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the server-line pro
Externí odkaz:
http://arxiv.org/abs/2007.06674
Autor:
Ahrens, Willow, Boman, Erik G.
The Variable Block Row (VBR) format is an influential blocked sparse matrix format designed for matrices with shared sparsity structure between adjacent rows and columns. VBR groups adjacent rows and columns, storing the resulting blocks that contain
Externí odkaz:
http://arxiv.org/abs/2005.12414