On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and Beyond
Autor: | Grigori Fursin, Anton Lokhmotov, Bruno Carpentieri, Fabiana Zollo, Marco Cianfriglia, Damiano Perri, Osvaldo Gervasi, Paolo Sylos Labini, Cedric Nugteren, Flavio Vella |
---|---|
Rok vydání: | 2021 |
Předmět: |
supervised classification
Computer science Generalization Decision tree tuning GPU computing performance optimization supervised classification neural networks predictive models 02 engineering and technology performance optimization 01 natural sciences Convolution Set (abstract data type) tuning 0103 physical sciences 0202 electrical engineering electronic engineering information engineering 010302 applied physics 020203 distributed computing Settore INF/01 - Informatica Artificial neural network business.industry Deep learning GPU computing neural networks predictive models Computer engineering Hardware and Architecture Artificial intelligence General-purpose computing on graphics processing units Heuristics business Software Information Systems |
Zdroj: | ACM Transactions on Architecture and Code Optimization. 18:1-24 |
ISSN: | 1544-3973 1544-3566 |
DOI: | 10.1145/3434402 |
Popis: | Efficient HPC libraries often expose multiple tunable parameters, algorithmic implementations, or a combination of them, to provide optimized routines. The optimal parameters and algorithmic choices may depend on input properties such as the shapes of the matrices involved in the operation. Traditionally, these parameters are manually tuned or set by auto-tuners. In emerging applications such as deep learning, this approach is not effective across the wide range of inputs and architectures used in practice. In this work, we analyze different machine learning techniques and predictive models to accelerate the convolution operator and GEMM. Moreover, we address the problem of dataset generation, and we study the performance, accuracy, and generalization ability of the models. Our insights allow us to improve the performance of computationally expensive deep learning primitives on high-end GPUs as well as low-power embedded GPU architectures on three different libraries. Experimental results show significant improvement in the target applications from 50% up to 300% compared to auto-tuned and high-optimized vendor-based heuristics by using simple decision tree- and MLP-based models. |
Databáze: | OpenAIRE |
Externí odkaz: |