Zobrazeno 1 - 10
of 50
pro vyhledávání: '"Li, Shenggui"'
Autor:
Du, Cunxiao, Jiang, Jing, Yuanchen, Xu, Wu, Jiawei, Yu, Sicheng, Li, Yongqi, Li, Shenggui, Xu, Kai, Nie, Liqiang, Tu, Zhaopeng, You, Yang
Speculative decoding is a relatively new decoding framework that leverages small and efficient draft models to reduce the latency of LLMs. In this study, we introduce GliDe and CaPE, two low-hassle modifications to vanilla speculative decoding to fur
Externí odkaz:
http://arxiv.org/abs/2402.02082
In recent years, large-scale models have demonstrated state-of-the-art performance across various domains. However, training such models requires various techniques to address the problem of limited computing power and memory on devices such as GPUs.
Externí odkaz:
http://arxiv.org/abs/2302.02599
In recent years, large language models have achieved great success due to their unprecedented size. However, training these models poses a challenge for most researchers as it requires a substantial number of GPUs. To reduce GPU memory usage, memory
Externí odkaz:
http://arxiv.org/abs/2212.05339
Large transformer models display promising performance on a wide range of natural language processing (NLP) tasks. Although the AI community has expanded the model scale to the trillion parameter level, the practical deployment of 10-100 billion para
Externí odkaz:
http://arxiv.org/abs/2209.02341
Autor:
Fang, Jiarui, Zhang, Geng, Han, Jiatong, Li, Shenggui, Bian, Zhengda, Li, Yongbin, Liu, Jin, You, Yang
Deep learning recommendation models (DLRMs) have been widely applied in Internet companies. The embedding tables of DLRMs are too large to fit on GPU memory entirely. We propose a GPU-based software cache approaches to dynamically manage the embeddin
Externí odkaz:
http://arxiv.org/abs/2208.05321
Autor:
Zhang, Hao, Cui, Yingchun, Zong, Shi, Chen, Shaocong, Ma, Lijie, Wang, Weixuan, Wang, Xuejiao, Li, Shenggui, Liu, Chenguang
Publikováno v:
In Marine Geology January 2025 479
Federated learning is proposed by Google to safeguard data privacy through training models locally on users' devices. However, with deep learning models growing in size to achieve better results, it becomes increasingly difficult to accommodate the w
Externí odkaz:
http://arxiv.org/abs/2202.11836
Autor:
Li, Shenggui, Liu, Hongxin, Bian, Zhengda, Fang, Jiarui, Huang, Haichen, Liu, Yuliang, Wang, Boxiang, You, Yang
The success of Transformer models has pushed the deep learning model scale to billions of parameters. Due to the limited memory resource of a single GPU, However, the best practice for choosing the optimal parallel strategy is still lacking, since it
Externí odkaz:
http://arxiv.org/abs/2110.14883
The pre-trained model (PTM) is revolutionizing Artificial Intelligence (AI) technology. However, the hardware requirement of PTM training is prohibitively high, making it a game for a small proportion of people. Therefore, we proposed PatrickStar sys
Externí odkaz:
http://arxiv.org/abs/2108.05818
Efficient GPU resource scheduling is essential to maximize resource utilization and save training costs for the increasing amount of deep learning workloads in shared GPU clusters. Existing GPU schedulers largely rely on static policies to leverage t
Externí odkaz:
http://arxiv.org/abs/2108.03645