Zobrazeno 1 - 10
of 8 949
pro vyhledávání: '"An, Yuxian"'
Autor:
Liu, Zhijian, Zhu, Ligeng, Shi, Baifeng, Zhang, Zhuoyang, Lou, Yuming, Yang, Shang, Xi, Haocheng, Cao, Shiyi, Gu, Yuxian, Li, Dacheng, Li, Xiuyu, Fang, Yunhao, Chen, Yukang, Hsieh, Cheng-Yu, Huang, De-An, Cheng, An-Chieh, Nath, Vishwesh, Hu, Jinyi, Liu, Sifei, Krishna, Ranjay, Xu, Daguang, Wang, Xiaolong, Molchanov, Pavlo, Kautz, Jan, Yin, Hongxu, Han, Song, Lu, Yao
Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy
Externí odkaz:
http://arxiv.org/abs/2412.04468
The dynamics of giant planet magnetospheres is controlled by a complex interplay between their fast rotation, their interaction with the solar wind, and their diverse internal plasma and momentum sources. In the ionosphere, the Hall and Pedersen cond
Externí odkaz:
http://arxiv.org/abs/2412.04219
Autor:
Yang, Ziyi, Zhang, Zaibin, Zheng, Zirui, Jiang, Yuxian, Gan, Ziyue, Wang, Zhiyu, Ling, Zijian, Chen, Jinsong, Ma, Martz, Dong, Bowen, Gupta, Prateek, Hu, Shuyue, Yin, Zhenfei, Li, Guohao, Jia, Xu, Wang, Lijun, Ghanem, Bernard, Lu, Huchuan, Lu, Chaochao, Ouyang, Wanli, Qiao, Yu, Torr, Philip, Shao, Jing
There has been a growing interest in enhancing rule-based agent-based models (ABMs) for social media platforms (i.e., X, Reddit) with more realistic large language model (LLM) agents, thereby allowing for a more nuanced study of complex systems. As a
Externí odkaz:
http://arxiv.org/abs/2411.11581
Knowledge distillation (KD) is widely used to train small, high-performing student language models (LMs) using large teacher LMs. While effective in fine-tuning, KD during pre-training faces challenges in efficiency, flexibility, and effectiveness. E
Externí odkaz:
http://arxiv.org/abs/2410.17215
This work investigates the selection of high-quality pre-training data from massive corpora to enhance LMs' capabilities for downstream usage. We formulate data selection as a generalized Optimal Control problem, which can be solved theoretically by
Externí odkaz:
http://arxiv.org/abs/2410.07064
Autor:
Liu, Weiwen, Huang, Xu, Zeng, Xingshan, Hao, Xinlong, Yu, Shuai, Li, Dexun, Wang, Shuai, Gan, Weinan, Liu, Zhengying, Yu, Yuanqing, Wang, Zezhong, Wang, Yuxian, Ning, Wu, Hou, Yutai, Wang, Bin, Wu, Chuhan, Wang, Xinzhi, Liu, Yong, Wang, Yasheng, Tang, Duyu, Tu, Dandan, Shang, Lifeng, Jiang, Xin, Tang, Ruiming, Lian, Defu, Liu, Qun, Chen, Enhong
Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and
Externí odkaz:
http://arxiv.org/abs/2409.00920
In the field of large language models (LLMs), Knowledge Distillation (KD) is a critical technique for transferring capabilities from teacher models to student models. However, existing KD methods face limitations and challenges in distillation of LLM
Externí odkaz:
http://arxiv.org/abs/2406.19774
Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds significant promise, as scaling it in the post-training stage trends towards bette
Externí odkaz:
http://arxiv.org/abs/2406.14491
This work studies the general principles of improving the learning of language models (LMs), which aims at reducing the necessary training steps for achieving superior performance. Specifically, we present a theory for the optimal learning of LMs. We
Externí odkaz:
http://arxiv.org/abs/2402.17759
Autor:
Li, Haoran, Dong, Qingxiu, Tang, Zhengyang, Wang, Chaojun, Zhang, Xingxing, Huang, Haoyang, Huang, Shaohan, Huang, Xiaolong, Huang, Zeqiang, Zhang, Dongdong, Gu, Yuxian, Cheng, Xin, Wang, Xun, Chen, Si-Qing, Dong, Li, Lu, Wei, Sui, Zhifang, Wang, Benyou, Lam, Wai, Wei, Furu
We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs). Unlike prior work that relies on seed examples or existing datasets to construct instruction tuning data,
Externí odkaz:
http://arxiv.org/abs/2402.13064