TensorLightning: A Traffic-Efficient Distributed Deep Learning on Commodity Spark Clusters

Autor:	Se-Il Lee, Sungroh Yoon, Chang-Sung Jeong, Jaehong Park, Jaehee Jang, Hanjoo Kim
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	General Computer Science Computer science Distributed computing 02 engineering and technology 010501 environmental sciences 01 natural sciences Convolutional neural network Server Spark (mathematics) 0202 electrical engineering electronic engineering information engineering General Materials Science Cluster analysis 0105 earth and related environmental sciences Artificial neural network Apache Spark business.industry Deep learning General Engineering deep learning 020206 networking & telecommunications distributed system TensorLightning commodity servers Recurrent neural network Stochastic gradient descent Asynchronous communication Artificial intelligence lcsh:Electrical engineering. Electronics. Nuclear engineering business lcsh:TK1-9971
Zdroj:	IEEE Access, Vol 6, Pp 27671-27680 (2018)
ISSN:	2169-3536
Popis:	With the recent success of deep learning, the amount of data and computation continues to grow daily. Hence a distributed deep learning system that shares the training workload has been researched extensively. Although a scale-out distributed environment using commodity servers is widely used, not only is there a limit due to synchronous operation and communication traffic but also combining deep neural network (DNN) training with existing clusters often demands additional hardware and migration between different cluster frameworks or libraries, which is highly inefficient. Therefore, we propose TensorLightning which integrates the widely used data pipeline of Apache Spark with powerful deep learning libraries, Caffe and TensorFlow. TensorLightning embraces a brand-new parameter aggregation algorithm and parallel asynchronous parameter managing schemes to relieve communication discrepancies and overhead. We redesign the elastic averaging stochastic gradient descent algorithm with pruned and sparse form parameters. Our approach provides the fast and flexible DNN training with high accessibility. We evaluated our proposed framework with convolutional neural network and recurrent neural network models; the framework reduces network traffic by 67% with faster convergence.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e373361de4ca8e83b151d87f9f14178f https://ieeexplore.ieee.org/document/8369060/ Zobrazit plný text záznamu