Zobrazeno 1 - 10
of 1 419
pro vyhledávání: '"Furu, P"'
Autor:
Chang, Yaoyao, Cui, Lei, Dong, Li, Huang, Shaohan, Huang, Yangyu, Huang, Yupan, Li, Scarlett, Lv, Tengchao, Ma, Shuming, Sun, Qinzheng, Wang, Wenhui, Wei, Furu, Xin, Ying, Yang, Mao, Yin, Qiufeng, Zhang, Xingxing
Pre-training Large Language Models (LLMs) on high-quality, meticulously curated datasets is widely recognized as critical for enhancing their performance and generalization capabilities. This study explores the untapped potential of Common Crawl as a
Externí odkaz:
http://arxiv.org/abs/2412.03398
Multi-Head Mixture-of-Experts (MH-MoE) demonstrates superior performance by using the multi-head mechanism to collectively attend to information from various representation spaces within different experts. In this paper, we present a novel implementa
Externí odkaz:
http://arxiv.org/abs/2411.16205
Preference optimization techniques, such as Direct Preference Optimization (DPO), are frequently employed to enhance the reasoning capabilities of large language models (LLMs) in domains like mathematical reasoning and coding, typically following sup
Externí odkaz:
http://arxiv.org/abs/2411.16345
Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining their performance. In this work, we introduce BitNet a4.8, enabling 4-bit activa
Externí odkaz:
http://arxiv.org/abs/2411.04965
Image aesthetics is a crucial metric in the field of image generation. However, textual aesthetics has not been sufficiently explored. With the widespread application of large language models (LLMs), previous work has primarily focused on the correct
Externí odkaz:
http://arxiv.org/abs/2411.02930
Autor:
Li, Zongyi, Hu, Shujie, Liu, Shujie, Zhou, Long, Choi, Jeongsoo, Meng, Lingwei, Guo, Xun, Li, Jinyu, Ling, Hefei, Wei, Furu
Text-to-video models have recently undergone rapid and substantial advancements. Nevertheless, due to limitations in data and computational resources, achieving efficient generation of long videos with rich motion dynamics remains a significant chall
Externí odkaz:
http://arxiv.org/abs/2410.20502
Autor:
Zhang, Hengyuan, Shang, Chenming, Wang, Sizhe, Zhang, Dongdong, Sun, Renliang, Yu, Yiyao, Yang, Yujiu, Wei, Furu
Although fine-tuning Large Language Models (LLMs) with multilingual data can rapidly enhance the multilingual capabilities of LLMs, they still exhibit a performance gap between the dominant language (e.g., English) and non-dominant ones due to the im
Externí odkaz:
http://arxiv.org/abs/2410.19453
Synthetic data generation has become an increasingly popular way of training models without the need for large, manually labeled datasets. For tasks like text embedding, synthetic data offers diverse and scalable training examples, significantly redu
Externí odkaz:
http://arxiv.org/abs/2410.18634
Autor:
Wang, Jinheng, Zhou, Hansong, Song, Ting, Mao, Shaoguang, Ma, Shuming, Wang, Hongyu, Xia, Yan, Wei, Furu
Recent advances in 1-bit Large Language Models (LLMs), such as BitNet and BitNet b1.58, present a promising approach to enhancing the efficiency of LLMs in terms of speed and energy consumption. These developments also enable local LLM deployment acr
Externí odkaz:
http://arxiv.org/abs/2410.16144
Autor:
Lin, Fangru, Mao, Shaoguang, La Malfa, Emanuele, Hofmann, Valentin, de Wynter, Adrian, Yao, Jing, Chen, Si-Qing, Wooldridge, Michael, Wei, Furu
Language is not monolithic. While many benchmarks are used as proxies to systematically estimate Large Language Models' (LLM) performance in real-life tasks, they tend to ignore the nuances of within-language variation and thus fail to model the expe
Externí odkaz:
http://arxiv.org/abs/2410.11005