The Next 700 Accelerated Layers

Autor:	Theodoros Theodoridis, William S. Moses, Albert Cohen, Sven Verdoolaege, Priya Goyal, Oleksandr Zinenko, Andrew Adams, Nicolas Vasilache, Zachary DeVito
Rok vydání:	2019
Předmět:	020203 distributed computing business.industry Computer science Deep learning Evolutionary algorithm Optimizing compiler 020207 software engineering 02 engineering and technology Tensor algebra Parallel computing Ricci calculus CUDA Hardware and Architecture Synchronization (computer science) 0202 electrical engineering electronic engineering information engineering Hardware acceleration Artificial intelligence business Software Information Systems
Zdroj:	ACM Transactions on Architecture and Code Optimization. 16:1-26
ISSN:	1544-3973 1544-3566
DOI:	10.1145/3355606
Popis:	Deep learning frameworks automate the deployment, distribution, synchronization, memory allocation, and hardware acceleration of models represented as graphs of computational operators. These operators wrap high-performance libraries such as cuDNN or NNPACK. When the computation does not match any predefined library call, custom operators must be implemented, often at high engineering cost and performance penalty, limiting the pace of innovation. To address this productivity gap, we propose and evaluate: (1) a domain-specific language with a tensor notation close to the mathematics of deep learning; (2) a Just-In-Time optimizing compiler based on the polyhedral framework; (3) carefully coordinated linear optimization and evolutionary algorithms to synthesize high-performance CUDA kernels; (4) the transparent integration of our flow into PyTorch and Caffe2, providing the fully automatic synthesis of high-performance GPU kernels from simple tensor algebra. The performance is comparable to, and often exceeds the performance of, highly tuned libraries.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::a78c501fca6d71cf85f21c821ecd7ef5 https://doi.org/10.1145/3355606 Zobrazit plný text záznamu