Zobrazeno 1 - 10
of 47
pro vyhledávání: '"Hao, Yongchang"'
The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the cons
Externí odkaz:
http://arxiv.org/abs/2410.20650
Large language models have become increasingly popular and demonstrated remarkable performance in various natural language processing (NLP) tasks. However, these models are typically computationally expensive and difficult to be deployed in resource-
Externí odkaz:
http://arxiv.org/abs/2409.12500
Second-order optimization approaches like the generalized Gauss-Newton method are considered more powerful as they utilize the curvature information of the objective function with preconditioning matrices. Albeit offering tempting theoretical benefit
Externí odkaz:
http://arxiv.org/abs/2402.03295
Despite large neural networks demonstrating remarkable abilities to complete different tasks, they require excessive memory usage to store the optimization states for training. To alleviate this, the low-rank adaptation (LoRA) is proposed to reduce t
Externí odkaz:
http://arxiv.org/abs/2402.03293
Reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The reward function plays an important role in making RL training successful. However, previous reward funct
Externí odkaz:
http://arxiv.org/abs/2210.08708
Open-domain dialogue systems aim to interact with humans through natural language texts in an open-ended fashion. Despite the recent success of super large dialogue systems such as ChatGPT, using medium-to-small-sized dialogue systems remains the com
Externí odkaz:
http://arxiv.org/abs/2209.14627
Autor:
Wang, Wenxuan, Jiao, Wenxiang, Hao, Yongchang, Wang, Xing, Shi, Shuming, Tu, Zhaopeng, Lyu, Michael
In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation~(NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main differ
Externí odkaz:
http://arxiv.org/abs/2203.08442
Non-Autoregressive machine Translation (NAT) models have demonstrated significant inference speedup but suffer from inferior translation accuracy. The common practice to tackle the problem is transferring the Autoregressive machine Translation (AT) k
Externí odkaz:
http://arxiv.org/abs/2010.12868
Autor:
Lei, Ming, Du, Li, Jiao, Hanwei, Cheng, Ying, Zhang, Donglin, Hao, Yongchang, Li, Gangshan, Qiu, Wei, Fan, Quanshui, Li, Chengyao, Chen, Chuanfu, Wang, Fengyang
Publikováno v:
In Veterinary Microbiology 7 December 2012 160(3-4):362-368
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.