Autor:	Aberdeen, Douglas, Baxter, Jonathan
Zdroj:	Concurrency: Practice and Experience; February 2001, Vol. 13 Issue: 2 p103-119, 17p
Abstrakt:	Generalized matrixmatrix multiplication forms the kernel of many mathematical algorithms, hence a faster matrixmatrix multiply immediately benefits these algorithms. In this paper we implement efficient matrix multiplication for large matrices using the Intel Pentium single instruction multiple data (SIMD) floating point architecture. The main difficulty with the Pentium and other commodity processors is the need to efficiently utilize the cache hierarchy, particularly given the growing gap between main-memory and CPU clock speeds. We give a detailed description of the register allocation, Level 1 and Level 2 cache blocking strategies that yield the best performance for the Pentium III family. Our results demonstrate an average performance of 2.09 times faster than the leading public domain matrixmatrix multiply routines and comparable performance with Intel's SIMD small matrixmatrix multiply routines. Copyright © 2001 John Wiley & Sons, Ltd.
Databáze:	Supplemental Index
Externí odkaz:	Zobrazit plný text záznamu