Zobrazeno 1 - 10
of 275
pro vyhledávání: '"Zhao Hanyu"'
Autor:
Wang, Liangdong, Zhang, Bo-Wen, Wu, Chengwei, Zhao, Hanyu, Shi, Xiaofeng, Gu, Shuhao, Li, Jijie, Ma, Quanyue, Pan, TengFei, Liu, Guang
We present CCI3.0-HQ (https://huggingface.co/datasets/BAAI/CCI3-HQ), a high-quality 500GB subset of the Chinese Corpora Internet 3.0 (CCI3.0)(https://huggingface.co/datasets/BAAI/CCI3-Data), developed using a novel two-stage hybrid filtering pipeline
Externí odkaz:
http://arxiv.org/abs/2410.18505
Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks. In this work, we demonstrate that the order of training data can lead to significant training imbalances, potentially resulting in performance degrad
Externí odkaz:
http://arxiv.org/abs/2410.03743
With the availability of various instruction datasets, a pivotal challenge is how to effectively select and integrate these instructions to fine-tune large language models (LLMs). Previous research mainly focuses on selecting individual high-quality
Externí odkaz:
http://arxiv.org/abs/2409.07045
Autor:
Zhang, Xinyi, Zhao, Hanyu, Xiao, Wencong, Jia, Xianyan, Xu, Fei, Li, Yong, Lin, Wei, Liu, Fangming
The era of large deep learning models has given rise to advanced training strategies such as 3D parallelism and the ZeRO series. These strategies enable various (re-)configurable execution plans for a training job, which exhibit remarkably different
Externí odkaz:
http://arxiv.org/abs/2408.08586
Autor:
Zhang, Bo-Wen, Wang, Liangdong, Yuan, Ye, Li, Jijie, Gu, Shuhao, Zhao, Mengdi, Wu, Xinya, Liu, Guang, Wu, Chengwei, Zhao, Hanyu, Du, Li, Ju, Yiming, Ma, Quanyue, Ao, Yulong, Zhao, Yingli, Zhu, Songhe, Cao, Zhou, Liang, Dong, Lin, Yonghua, Zhang, Ming, Wang, Shunfei, Zhou, Yanxin, Ye, Min, Chen, Xuekai, Yu, Xinyang, Huang, Xiangjun, Yang, Jian
In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch wi
Externí odkaz:
http://arxiv.org/abs/2408.06567
Autor:
Dong, Jianbo, Luo, Bin, Zhang, Jun, Zhang, Pengcheng, Feng, Fei, Zhu, Yikai, Liu, Ang, Chen, Zian, Shi, Yi, Jiao, Hairong, Lu, Gang, Guan, Yu, Zhai, Ennan, Xiao, Wencong, Zhao, Hanyu, Yuan, Man, Yang, Siran, Li, Xiang, Wang, Jiamang, Men, Rui, Zhang, Jianwei, Zhong, Huang, Cai, Dennis, Xie, Yuan, Fu, Binzhang
The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel t
Externí odkaz:
http://arxiv.org/abs/2406.04594
Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and unpredictable in terms
Externí odkaz:
http://arxiv.org/abs/2406.03243
The prior drift is crucial in Continual Test-Time Adaptation (CTTA) methods that only use unlabeled test data, as it can cause significant error propagation. In this paper, we introduce VCoTTA, a variational Bayesian approach to measure uncertainties
Externí odkaz:
http://arxiv.org/abs/2402.08182
Autor:
Lin, Bin, Zhang, Chen, Peng, Tao, Zhao, Hanyu, Xiao, Wencong, Sun, Minmin, Liu, Anmin, Zhang, Zhipeng, Li, Lanbo, Qiu, Xiafei, Li, Shen, Ji, Zhigang, Xie, Tao, Li, Yong, Lin, Wei
Large Language Models (LLMs) demonstrate substantial potential across a diverse array of domains via request serving. However, as trends continue to push for expanding context sizes, the autoregressive nature of LLMs results in highly dynamic behavio
Externí odkaz:
http://arxiv.org/abs/2401.02669
As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce overheads. However,
Externí odkaz:
http://arxiv.org/abs/2310.19295