Zobrazeno 1 - 10
of 62
pro vyhledávání: '"Lai, Zhiquan"'
The size of deep learning models has been increasing to enhance model quality. The linear increase in training computation budget with model size means that training an extremely large-scale model is exceedingly time-consuming. Recently, the Mixture
Externí odkaz:
http://arxiv.org/abs/2411.10003
Large Language Models (LLMs) have demonstrated impressive performance across various domains. However, the enormous number of model parameters makes fine-tuning challenging, significantly limiting their application and deployment. Existing solutions
Externí odkaz:
http://arxiv.org/abs/2407.17029
Autor:
Li, Shengwei, Lai, Zhiquan, Hao, Yanqi, Liu, Weijie, Ge, Keshi, Deng, Xiaoge, Li, Dongsheng, Lu, Kai
Deep learning is experiencing a rise in foundation models that are expected to lead in various fields. The massive number of parameters necessitates the use of tensor model parallelism (TMP) in foundation model training. However, TMP requires frequen
Externí odkaz:
http://arxiv.org/abs/2305.16121
Autor:
Lai, Zhiquan, Li, Shengwei, Tang, Xudong, Ge, Keshi, Liu, Weijie, Duan, Yabo, Qiao, Linbo, Li, Dongsheng
Publikováno v:
IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 5, pp. 1466-1478, May 2023
Foundation models are becoming the dominant deep learning technologies. Pretraining a foundation model is always time-consumed due to the large scale of both the model parameter and training dataset. Besides being computing-intensive, the training pr
Externí odkaz:
http://arxiv.org/abs/2206.04959
Autor:
Tang, Yu, Wang, Chenyu, Zhang, Yufan, Liu, Yuliang, Zhang, Xingcheng, Qiao, Linbo, Lai, Zhiquan, Li, Dongsheng
The further development of deep neural networks is hampered by the limited GPU memory resource. Therefore, the optimization of GPU memory resources is highly demanded. Swapping and recomputation are commonly applied to make better use of GPU memory i
Externí odkaz:
http://arxiv.org/abs/2203.15980
Autor:
Wu, Changling, Ma, Bohui, McClements, David Julian, Lai, Zhiquan, Hou, Jie, Wang, Shuaizheng, Wang, Xinru, Qiu, Yuxin, Wu, Fenghua, Fang, Guanyu, Liu, Xingquan, Wang, Peng
Publikováno v:
In Food Chemistry 1 December 2024 460 Part 2
Distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find that these DL frameworks have relati
Externí odkaz:
http://arxiv.org/abs/2110.09132
Distributed stochastic gradient descent (SGD) approach has been widely used in large-scale deep learning, and the gradient collective method is vital to ensure the training scalability of the distributed deep learning system. Collective communication
Externí odkaz:
http://arxiv.org/abs/2110.02140
Graph neural networks (GNN) have been proven to be mature enough for handling graph-structured data on node-level graph representation learning tasks. However, the graph pooling technique for learning expressive graph-level representation is critical
Externí odkaz:
http://arxiv.org/abs/2104.05960
Autor:
Wu, Changling, McClements, David Julian, Ma, Bohui, Lai, Zhiquan, Wu, Fenghua, Liu, Xingquan, Wang, Peng
Publikováno v:
In International Journal of Biological Macromolecules February 2024 258 Part 2