Zobrazeno 1 - 10
of 200
pro vyhledávání: '"Wang, Bailin"'
Autor:
Zhang, Yu, Yang, Songlin, Zhu, Ruijie, Zhang, Yue, Cui, Leyang, Wang, Yiqiao, Wang, Bolun, Shi, Freda, Wang, Bailin, Bi, Wei, Zhou, Peng, Fu, Guohong
Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for
Externí odkaz:
http://arxiv.org/abs/2409.07146
Autor:
Zeng, Zhongshen, Liu, Yinhong, Wan, Yingjia, Li, Jingyao, Chen, Pengguang, Dai, Jianbo, Yao, Yuxuan, Xu, Rongwu, Qi, Zehan, Zhao, Wanru, Shen, Linling, Lu, Jianqiao, Tan, Haochen, Chen, Yukang, Zhang, Hao, Shi, Zhan, Wang, Bailin, Guo, Zhijiang, Jia, Jiaya
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capa
Externí odkaz:
http://arxiv.org/abs/2406.13975
Transformers with linear attention (i.e., linear transformers) and state-space models have recently been suggested as a viable linear-time alternative to transformers with softmax attention. However, these models still underperform transformers espec
Externí odkaz:
http://arxiv.org/abs/2406.06484
With the widespread adoption of Large Language Models (LLMs), the prevalence of iterative interactions among these models is anticipated to increase. Notably, recent advancements in multi-round self-improving methods allow LLMs to generate new exampl
Externí odkaz:
http://arxiv.org/abs/2404.04286
We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. We model the decision of which LLM generates the next token as a latent variable. By optimizing the marginal likeli
Externí odkaz:
http://arxiv.org/abs/2403.03870
Large-scale neural language models exhibit a remarkable capacity for in-context learning (ICL): they can infer novel functions from datasets provided as input. Most of our current understanding of when and how ICL arises comes from LMs trained on ext
Externí odkaz:
http://arxiv.org/abs/2401.12973
Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures inherent in prog
Externí odkaz:
http://arxiv.org/abs/2401.10716
Transformers with linear attention allow for efficient parallel training but can simultaneously be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear-time inference complexity. However, linear attention generally underpe
Externí odkaz:
http://arxiv.org/abs/2312.06635
Autor:
Tang, Zilu, Agarwal, Mayank, Shypula, Alex, Wang, Bailin, Wijaya, Derry, Chen, Jie, Kim, Yoon
This work explores the use of self-generated natural language explanations as an intermediate step for code-to-code translation with language models. Across three types of explanations and 19 programming languages constructed from the MultiPL-E datas
Externí odkaz:
http://arxiv.org/abs/2311.07070
Autor:
Qiu, Linlu, Jiang, Liwei, Lu, Ximing, Sclar, Melanie, Pyatkin, Valentina, Bhagavatula, Chandra, Wang, Bailin, Kim, Yoon, Choi, Yejin, Dziri, Nouha, Ren, Xiang
The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence. Prior work suggests that language models (LMs) often fall short on
Externí odkaz:
http://arxiv.org/abs/2310.08559