Zobrazeno 1 - 10
of 27
pro vyhledávání: '"Zheng, Zangwei"'
Autor:
Qin, Ziheng, Xu, Zhaopan, Zhou, Yukun, Zheng, Zangwei, Cheng, Zebang, Tang, Hao, Shang, Lei, Sun, Baigui, Peng, Xiaojiang, Timofte, Radu, Yao, Hongxun, Wang, Kai, You, Yang
Deep learning benefits from the growing abundance of available data. Meanwhile, efficiently dealing with the growing data scale has become a challenge. Data publicly available are from different sources with various qualities, and it is impractical t
Externí odkaz:
http://arxiv.org/abs/2405.18347
The increase in parameter size of multimodal large language models (MLLMs) introduces significant capabilities, particularly in-context learning, where MLLMs enhance task performance without updating pre-trained parameters. This effectiveness, howeve
Externí odkaz:
http://arxiv.org/abs/2404.12866
Autor:
Zhao, Xuanlei, Cheng, Shenggan, Chen, Chang, Zheng, Zangwei, Liu, Ziming, Yang, Zheming, You, Yang
Scaling multi-dimensional transformers to long sequences is indispensable across various domains. However, the challenges of large memory requirements and slow speeds of such sequences necessitate sequence parallelism. All existing approaches fall un
Externí odkaz:
http://arxiv.org/abs/2403.10266
Click-Through Rate (CTR) prediction holds paramount significance in online advertising and recommendation scenarios. Despite the proliferation of recent CTR prediction models, the improvements in performance have remained limited, as evidenced by ope
Externí odkaz:
http://arxiv.org/abs/2403.00798
To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34
Externí odkaz:
http://arxiv.org/abs/2402.01739
Adaptive gradient methods, such as Adam and LAMB, have demonstrated excellent performance in the training of large language models. Nevertheless, the need for adaptivity requires maintaining second-moment estimates of the per-parameter gradients, whi
Externí odkaz:
http://arxiv.org/abs/2307.02047
Recent research has highlighted the importance of dataset size in scaling language models. However, large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit
Externí odkaz:
http://arxiv.org/abs/2305.13230
Large language models (LLMs) have revolutionized the field of AI, demonstrating unprecedented capacity across various tasks. However, the inference process for LLMs comes with significant computational costs. In this paper, we propose an efficient LL
Externí odkaz:
http://arxiv.org/abs/2305.13144
Continual learning (CL) can help pre-trained vision-language models efficiently adapt to new or under-trained data distributions without re-training. Nevertheless, during the continual training of the Contrastive Language-Image Pre-training (CLIP) mo
Externí odkaz:
http://arxiv.org/abs/2303.06628
Autor:
Qin, Ziheng, Wang, Kai, Zheng, Zangwei, Gu, Jianyang, Peng, Xiangyu, Xu, Zhaopan, Zhou, Daquan, Shang, Lei, Sun, Baigui, Xie, Xuansong, You, Yang
Data pruning aims to obtain lossless performances with less overall cost. A common approach is to filter out samples that make less contribution to the training. This could lead to gradient expectation bias compared to the original data. To solve thi
Externí odkaz:
http://arxiv.org/abs/2303.04947