Zobrazeno 1 - 10
of 15
pro vyhledávání: '"Xiong, Yizhe"'
Autor:
Lian, Haoran, Xiong, Yizhe, Lin, Zijia, Niu, Jianwei, Mo, Shasha, Chen, Hui, Liu, Peng, Ding, Guiguang
The prevalent use of Byte Pair Encoding (BPE) in Large Language Models (LLMs) facilitates robust handling of subword units and avoids issues of out-of-vocabulary words. Despite its success, a critical challenge persists: long tokens, rich in semantic
Externí odkaz:
http://arxiv.org/abs/2411.05504
Autor:
Su, Zhenpeng, Wu, Xing, Lin, Zijia, Xiong, Yizhe, Lv, Minxuan, Ma, Guangyuan, Chen, Hui, Hu, Songlin, Ding, Guiguang
Large language models (LLM) have been attracting much attention from the community recently, due to their remarkable performance in all kinds of downstream tasks. According to the well-known scaling law, scaling up a dense LLM enhances its capabiliti
Externí odkaz:
http://arxiv.org/abs/2410.16077
Autor:
Su, Zhenpeng, Lin, Zijia, Bai, Xue, Wu, Xing, Xiong, Yizhe, Lian, Haoran, Ma, Guangyuan, Chen, Hui, Ding, Guiguang, Zhou, Wei, Hu, Songlin
Scaling the size of a model enhances its capabilities but significantly increases computation complexity. Mixture-of-Experts models (MoE) address the issue by allowing model size to scale up without substantially increasing training or inference cost
Externí odkaz:
http://arxiv.org/abs/2407.09816
Autor:
Lian, Haoran, Xiong, Yizhe, Niu, Jianwei, Mo, Shasha, Su, Zhenpeng, Lin, Zijia, Chen, Hui, Liu, Peng, Han, Jungong, Ding, Guiguang
Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbal
Externí odkaz:
http://arxiv.org/abs/2404.17808
Autor:
Xiong, Yizhe, Chen, Xiansheng, Ye, Xin, Chen, Hui, Lin, Zijia, Lian, Haoran, Su, Zhenpeng, Niu, Jianwei, Ding, Guiguang
Recently, Large Language Models (LLMs) have been widely adopted in a wide range of tasks, leading to increasing attention towards the research on how scaling LLMs affects their performance. Existing works, termed Scaling Laws, have discovered that th
Externí odkaz:
http://arxiv.org/abs/2404.17785
Autor:
Xiong, Yizhe, Chen, Hui, Hao, Tianxiang, Lin, Zijia, Han, Jungong, Zhang, Yuesong, Wang, Guoxin, Bao, Yongjun, Ding, Guiguang
Recently, the scale of transformers has grown rapidly, which introduces considerable challenges in terms of training overhead and inference efficiency in the scope of task adaptation. Existing works, namely Parameter-Efficient Fine-Tuning (PEFT) and
Externí odkaz:
http://arxiv.org/abs/2403.09192
Unsupervised domain adaptation aims to transfer knowledge from a fully-labeled source domain to an unlabeled target domain. However, in real-world scenarios, providing abundant labeled data even in the source domain can be infeasible due to the diffi
Externí odkaz:
http://arxiv.org/abs/2309.15575
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.