ExTensor
Autor: | Christopher W. Fletcher, Joel Emer, Michael Pellauer, Hadi Asghari-Moghaddam, Aamer Jaleel, Kartik Hegde, Neal Crago, Edgar Solomonik |
---|---|
Rok vydání: | 2019 |
Předmět: |
010302 applied physics
Speedup Computer science 02 engineering and technology Tensor algebra Parallel computing 01 natural sciences CAS latency 020202 computer hardware & architecture 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Bandwidth (computing) Hardware acceleration Multiplication Tensor Throughput (business) |
Zdroj: | MICRO |
DOI: | 10.1145/3352460.3358275 |
Popis: | Generalized tensor algebra is a prime candidate for acceleration via customized ASICs. Modern tensors feature a wide range of data sparsity, with the density of non-zero elements ranging from 10-6% to 50%. This paper proposes a novel approach to accelerate tensor kernels based on the principle of hierarchical elimination of computation in the presence of sparsity. This approach relies on rapidly finding intersections---situations where both operands of a multiplication are non-zero---enabling new data fetching mechanisms and avoiding memory latency overheads associated with sparse kernels implemented in software. We propose the ExTensor accelerator, which builds these novel ideas on handling sparsity into hardware to enable better bandwidth utilization and compute throughput. We evaluate ExTensor on several kernels relative to industry libraries (Intel MKL) and state-of-the-art tensor algebra compilers (TACO). When bandwidth normalized, we demonstrate an average speedup of 3.4×, 1.3×, 2.8×, 24.9×, and 2.7× on SpMSpM, SpMM, TTV, TTM, and SDDMM kernels respectively over a server class CPU. |
Databáze: | OpenAIRE |
Externí odkaz: |