Výsledky vyhledávání

Report

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

Autor: Chang, Li-Wen, Bao, Wenlei, Hou, Qi, Jiang, Chengquan, Zheng, Ningxin, Zhong, Yinmin, Zhang, Xuanrun, Song, Zuquan, Yao, Chengji, Jiang, Ziheng, Lin, Haibin, Jin, Xin, Liu, Xin

Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning

Externí odkaz: http://arxiv.org/abs/2406.06858

Zobrazit plný text záznamu

A Variable Batch Size Strategy for Large Scale Distributed DNN Training

Autor: Junmin Xiao, Ninghui Sun, Hu Zhongzhe, Tian Zhongbo, Zhu Hongrui, Yao Chengji, Guangming Tan, Xiaoyang Zhang

Publikováno v: ISPA/BDCloud/SocialCom/SustainCom

Large batch distributed synchronous stochastic gradient descent (SGD) has been widely used to train deep neural networks on a distributed memory system with multi-nodes, which can leverage parallel resources to reduce the number of iterative steps an

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::76f865bfdf658a5fbcbf8a7aa9cc467c
https://doi.org/10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00074

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání