Zobrazeno 1 - 2
of 2
pro vyhledávání: '"Smith, Tyler Michael"'
A tight lower bound for required I/O when computing an ordinary matrix-matrix multiplication on a processor with two layers of memory is established. Prior work obtained weaker lower bounds by reasoning about the number of segments needed to perform
Externí odkaz:
http://arxiv.org/abs/1702.02017
Autor:
Veras, Richard Michael, Low, Tze Meng, Smith, Tyler Michael, van de Geijn, Robert, Franchetti, Franz
High performance dense linear algebra (DLA) libraries often rely on a general matrix multiply (Gemm) kernel that is implemented using assembly or with vector intrinsics. In particular, the real-valued Gemm kernels provide the overwhelming fraction of
Externí odkaz:
http://arxiv.org/abs/1611.08035