Efficient distributed machine learning for large-scale models by reducing redundant communication

Autor:	Harumichi Yokoyama, Takuya Araki
Rok vydání:	2017
Předmět:	Ethernet Distributed database Semantics (computer science) Computer science business.industry 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Data modeling 020204 information systems Server 0202 electrical engineering electronic engineering information engineering Communication methods Overhead (computing) Artificial intelligence business computer Scale model 0105 earth and related environmental sciences
Zdroj:	SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI
DOI:	10.1109/uic-atc.2017.8397638
Popis:	Distributed machine learning is used to train large-scale models within a moderate amount of time. To accelerate the training, all nodes have to exchange calculation results frequently. However, the communication of the updated parameters is a large overhead affecting the total execution time. This paper proposes a communication method to decrease redundant transmissions of parameters without changing the semantics of the algorithm. Before the training, we identify the parts of the collective communication that can be omitted and replace them with direct communication between the nodes requesting the intermediate results. We implemented this algorithm and evaluated it on a cluster of five nodes connected with 10-Gbps Ethernet. The evaluation using a real dataset showed that our method reduced the number of elements exchanged between nodes and shortened the communication time.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::d5410a72cf4305c32bea441aba4dbe30 https://doi.org/10.1109/uic-atc.2017.8397638 Zobrazit plný text záznamu