swFLOW: A Dataflow Deep Learning Framework on Sunway TaihuLight Supercomputer
Autor: | Jose Monsalve Diaz, Mingfan Li, Guang R. Gao, Han Lin, Lin Zeng, Hong An |
---|---|
Rok vydání: | 2019 |
Předmět: |
Instruction prefetch
Remote direct memory access Speedup Dataflow Computer science business.industry Deep learning 020207 software engineering 02 engineering and technology Parallel computing computer.file_format 010501 environmental sciences computer.software_genre Supercomputer 01 natural sciences Bottleneck Software framework Stochastic gradient descent 0202 electrical engineering electronic engineering information engineering Executable Artificial intelligence business computer 0105 earth and related environmental sciences Sunway TaihuLight |
Zdroj: | HPCC/SmartCity/DSS |
DOI: | 10.1109/hpcc/smartcity/dss.2019.00345 |
Popis: | Deep learning technology is widely used in many modern fields and a number of deep learning models and software frameworks have been proposed. However, it is still very difficult to process deep learning tasks efficiently on traditional high performance computing (HPC) systems with specialized architectures such as Sunway TaihuLight. In this paper, we propose swFLOW: a TensorFlow-based dataflow deep learning framework on Sunway TaihuLight. Based on the performance analysis results on convolutional neural network (CNN), we optimize the convolution layer, reduce the data layout transpose operation and get 10.42x speedup compared to single management processing element (MPE) version. As for distributed training, we use elastic averaging stochastic gradient descent (EASGD) algorithm to reduce communication and use data prefetch to avoid data fetch being a performance bottleneck. On 512 processes, we get a parallel efficiency of 81.01% with communication period τ = 8. Limited by the maximal executable batch size, the current performance of swFLOW is far from optimal. It is very necessary to further optimize using technology like remote direct memory access (RDMA) and model parallelism. |
Databáze: | OpenAIRE |
Externí odkaz: |