Exploiting Parallelism Opportunities with Deep Learning Frameworks
Autor: | David Brooks, Carole-Jean Wu, Yu Emma Wang, Xiaodong Wang, Kim Hazelwood |
---|---|
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Speedup Computer science Interface (Java) Inference Machine Learning (stat.ML) 02 engineering and technology Machine learning computer.software_genre Machine Learning (cs.LG) Statistics - Machine Learning 020204 information systems 0202 electrical engineering electronic engineering information engineering Set (psychology) Profiling (computer programming) Computer Science - Performance business.industry Deep learning Performance tuning Variety (cybernetics) Performance (cs.PF) Computer Science - Distributed Parallel and Cluster Computing Hardware and Architecture 020201 artificial intelligence & image processing Distributed Parallel and Cluster Computing (cs.DC) Artificial intelligence business computer Software Information Systems |
Zdroj: | ACM Transactions on Architecture and Code Optimization. 18:1-23 |
ISSN: | 1544-3973 1544-3566 |
Popis: | State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal setting in feature-rich frameworks, however, involves a non-trivial amount of performance profiling efforts and often relies on domain-specific knowledge. This article takes a deep dive into analyzing the performance impact of key design features in a machine learning framework and quantifies the role of parallelism. The observations and insights distill into a simple set of guidelines that one can use to achieve much higher training and inference speedup. Across a diverse set of real-world deep learning models, the evaluation results show that the proposed performance tuning guidelines outperform the Intel and TensorFlow recommended settings by 1.30× and 1.38×, respectively. |
Databáze: | OpenAIRE |
Externí odkaz: |