Zobrazeno 1 - 10
of 13
pro vyhledávání: '"Ju, Yiming"'
Autor:
Ju, Yiming, Ma, Huanhuan
In 2022, with the release of ChatGPT, large-scale language models gained widespread attention. ChatGPT not only surpassed previous models in terms of parameters and the scale of its pretraining corpus but also achieved revolutionary performance impro
Externí odkaz:
http://arxiv.org/abs/2411.07715
Supervised fine-tuning (SFT) is crucial for adapting Large Language Models (LLMs) to specific tasks. In this work, we demonstrate that the order of training data can lead to significant training imbalances, potentially resulting in performance degrad
Externí odkaz:
http://arxiv.org/abs/2410.03743
With the availability of various instruction datasets, a pivotal challenge is how to effectively select and integrate these instructions to fine-tune large language models (LLMs). Previous research mainly focuses on selecting individual high-quality
Externí odkaz:
http://arxiv.org/abs/2409.07045
Autor:
Zhang, Bo-Wen, Wang, Liangdong, Yuan, Ye, Li, Jijie, Gu, Shuhao, Zhao, Mengdi, Wu, Xinya, Liu, Guang, Wu, Chengwei, Zhao, Hanyu, Du, Li, Ju, Yiming, Ma, Quanyue, Ao, Yulong, Zhao, Yingli, Zhu, Songhe, Cao, Zhou, Liang, Dong, Lin, Yonghua, Zhang, Ming, Wang, Shunfei, Zhou, Yanxin, Ye, Min, Chen, Xuekai, Yu, Xinyang, Huang, Xiangjun, Yang, Jian
In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch wi
Externí odkaz:
http://arxiv.org/abs/2408.06567
Autor:
Xing, Xingrun, Zhang, Zheng, Ni, Ziyi, Xiao, Shitao, Ju, Yiming, Fan, Siqi, Wang, Yequan, Zhang, Jiajun, Li, Guoqi
Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language mode
Externí odkaz:
http://arxiv.org/abs/2406.03287
Recently, Locate-Then-Edit paradigm has emerged as one of the main approaches in changing factual knowledge stored in the Language models. However, there is a lack of research on whether present locating methods can pinpoint the exact parameters embe
Externí odkaz:
http://arxiv.org/abs/2309.16535
We present a general framework for unsupervised text style transfer with deep generative models. The framework models each sentence-label pair in the non-parallel corpus as partially observed from a complete quadruplet which additionally contains two
Externí odkaz:
http://arxiv.org/abs/2308.16584
The opaqueness of deep NLP models has motivated the development of methods for interpreting how deep models predict. Recently, work has introduced hierarchical attribution, which produces a hierarchical clustering of words, along with an attribution
Externí odkaz:
http://arxiv.org/abs/2210.13270
Modern deep learning models are notoriously opaque, which has motivated the development of methods for interpreting how deep models predict. This goal is usually approached with attribution method, which assesses the influence of features on model pr
Externí odkaz:
http://arxiv.org/abs/2109.05463
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.