Write-Avoiding Algorithms

Autor: Nicholas Knight, James Demmel, Laura Grigori, Penporn Koanantakool, Oded Schwartz, Erin Carson, Harsha Vardhan Simhadri
Přispěvatelé: New York University [New York] (NYU), NYU System (NYU), Department of Mathematics [Berkeley], University of California [Berkeley], University of California-University of California, Lawrence Berkeley National Laboratory [Berkeley] (LBNL), Algorithms and parallel tools for integrated numerical simulations (ALPINES), Laboratoire Jacques-Louis Lions (LJLL), Université Pierre et Marie Curie - Paris 6 (UPMC)-Université Paris Diderot - Paris 7 (UPD7)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Université Paris Diderot - Paris 7 (UPD7)-Centre National de la Recherche Scientifique (CNRS)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Mathématiques et de leurs Interactions (INSMI), Université Pierre et Marie Curie - Paris 6 (UPMC)-Université Paris Diderot - Paris 7 (UPD7)-Centre National de la Recherche Scientifique (CNRS), School of Engineering and Computer Science [Jerusalem], The Hebrew University of Jerusalem (HUJ), University of California [Berkeley] (UC Berkeley), University of California (UC)-University of California (UC)
Rok vydání: 2016
Předmět:
Zdroj: IPDPS
Proceedings of IEEE International Parallel & Distributed Processing Symposium, IPDPS 2016
Proceedings of IEEE International Parallel & Distributed Processing Symposium, IPDPS 2016, 2016, Chicago, United States
DOI: 10.1109/ipdps.2016.114
Popis: Communication, i.e., moving data between levels of a memory hierarchy or between processors over a network, is much more expensive (in time or energy) than arithmetic. There has thus been a recent focus on designing algorithms that minimize communication and, when possible, attain lower bounds on the total number of reads and writes. However, most previous work does not distinguish between the costs of reads and writes. Writes can be much more expensive than reads in some current and emerging storage devices such as nonvolatile memories. This motivates us to ask whether there are lower bounds on the number of writes that certain algorithms must perform, and whether these bounds are asymptotically smaller than bounds on the sum of reads and writes together. When these smaller lower bounds exist, we then ask when they are attainable, we call such algorithms "write-avoiding" (WA), to distinguish them from "communication-avoiding" (CA) algorithms, which only minimize the sum of reads and writes. We identify a number of cases in linear algebra and direct N-body methods where known CA algorithms are also WA (some are and some aren't). We also identify classes of algorithms, including Strassen's matrix multiplication, Cooley-Tukey FFT, and cache oblivious algorithms for classical linear algebra, where a WA algorithm cannot exist: the number of writes is unavoidably within a constant factor of the total number of reads and writes. We explore the interaction of WA algorithms with cache replacement policies and argue that the Least Recently Used policy works well with the WA algorithms in this paper. We provide empirical hardware counter measurements from Intel's Nehalem-EX microarchitecture to validate our theory. In the parallel case, for classical linear algebra, we show that it is impossible to attain lower bounds both on interprocessor communication and on writes to local memory, but either one is attainable by itself. Finally, we discuss WA algorithms for sparse iterative linear algebra.
Databáze: OpenAIRE