Adaptive Communication for Distributed Deep Learning on Commodity GPU Cluster

Autor: Li-Yung Ho, Jan-Jan Wu, Pangfeng Liu
Rok vydání: 2018
Předmět:
Zdroj: CCGrid
DOI: 10.1109/ccgrid.2018.00043
Popis: Deep learning is now the most promising approach to develop human-intelligent computer systems. To speedup the development of neural networks, researchers have designed many distributed learning algorithms to facilitate the training process. In these algorithms, people use a constant to indicate the communication period for model/gradient exchange. We find that this type of communication pattern could incur unnecessary and inefficient data transmission for some training methods e.g., elastic SGD and gossiping SGD. In this paper, we propose an adaptive communication method to improve the performance of gossiping SGD. Instead of using a fixed period for model exchange, we exchange the models with other machines according to the change of the local model. This makes the communication more efficient and thus improves the performance. The experiment results show that our method reduces the communication traffic by 92%, which results in 52% reduction in training time while preserving the prediction accuracy compared with gossiping SGD.
Databáze: OpenAIRE