Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Rolinger, Thomas B."'
Achieving high performance for Sparse MatrixMatrix Multiplication (SpMM) has received increasing research attention, especially on multi-core CPUs, due to the large input data size in applications such as graph neural networks (GNNs). Most existing s
Externí odkaz:
http://arxiv.org/abs/2312.05639
Irregular memory access patterns pose performance and user productivity challenges on distributed-memory systems. They can lead to fine-grained remote communication and the data access patterns are often not known until runtime. The Partitioned Globa
Externí odkaz:
http://arxiv.org/abs/2303.13954
Applications for deep learning and big data analytics have compute and memory requirements that exceed the limits of a single GPU. However, effectively scaling out an application to multiple GPUs is challenging due to the complexities of communicatio
Externí odkaz:
http://arxiv.org/abs/1812.05964
In big-data analytics, using tensor decomposition to extract patterns from large, sparse multivariate data is a popular technique. Many challenges exist for designing parallel, high performance tensor decomposition algorithms due to irregular data ac
Externí odkaz:
http://arxiv.org/abs/1812.05961
Achieving high performance for sparse applications is challenging due to irregular access patterns and weak locality. These properties preclude many static optimizations and degrade cache performance on traditional systems. To address these challenge
Externí odkaz:
http://arxiv.org/abs/1812.05955
Publikováno v:
In Journal of Parallel and Distributed Computing July 2019 129:83-98