Výsledky vyhledávání - "Wang, Bailin"

Report

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Autor: Zhang, Yu, Yang, Songlin, Zhu, Ruijie, Zhang, Yue, Cui, Leyang, Wang, Yiqiao, Wang, Bolun, Shi, Freda, Wang, Bailin, Bi, Wei, Zhou, Peng, Fu, Guohong

Linear attention Transformers and their gated variants, celebrated for enabling parallel training and efficient recurrent inference, still fall short in recall-intensive tasks compared to traditional Transformers and demand significant resources for

Externí odkaz: http://arxiv.org/abs/2409.07146

Zobrazit plný text záznamu

Report

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Autor: Zeng, Zhongshen, Liu, Yinhong, Wan, Yingjia, Li, Jingyao, Chen, Pengguang, Dai, Jianbo, Yao, Yuxuan, Xu, Rongwu, Qi, Zehan, Zhao, Wanru, Shen, Linling, Lu, Jianqiao, Tan, Haochen, Chen, Yukang, Zhang, Hao, Shi, Zhan, Wang, Bailin, Guo, Zhijiang, Jia, Jiaya

Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capa

Externí odkaz: http://arxiv.org/abs/2406.13975

Zobrazit plný text záznamu

Report

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Autor: Yang, Songlin, Wang, Bailin, Zhang, Yu, Shen, Yikang, Kim, Yoon

Transformers with linear attention (i.e., linear transformers) and state-space models have recently been suggested as a viable linear-time alternative to transformers with softmax attention. However, these models still underperform transformers espec

Externí odkaz: http://arxiv.org/abs/2406.06484

Zobrazit plný text záznamu

Report

Language Model Evolution: An Iterated Learning Perspective

Autor: Ren, Yi, Guo, Shangmin, Qiu, Linlu, Wang, Bailin, Sutherland, Danica J.

With the widespread adoption of Large Language Models (LLMs), the prevalence of iterative interactions among these models is anticipated to increase. Notably, recent advancements in multi-round self-improving methods allow LLMs to generate new exampl

Externí odkaz: http://arxiv.org/abs/2404.04286

Zobrazit plný text záznamu

Report

Learning to Decode Collaboratively with Multiple Language Models

Autor: Shen, Shannon Zejiang, Lang, Hunter, Wang, Bailin, Kim, Yoon, Sontag, David

We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. We model the decision of which LLM generates the next token as a latent variable. By optimizing the marginal likeli

Externí odkaz: http://arxiv.org/abs/2403.03870

Zobrazit plný text záznamu

Report

In-Context Language Learning: Architectures and Algorithms

Autor: Akyürek, Ekin, Wang, Bailin, Kim, Yoon, Andreas, Jacob

Large-scale neural language models exhibit a remarkable capacity for in-context learning (ICL): they can infer novel functions from datasets provided as input. Most of our current understanding of when and how ICL arises comes from LMs trained on ext

Externí odkaz: http://arxiv.org/abs/2401.12973

Zobrazit plný text záznamu

Report

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models

Autor: Agarwal, Mayank, Shen, Yikang, Wang, Bailin, Kim, Yoon, Chen, Jie

Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures inherent in prog

Externí odkaz: http://arxiv.org/abs/2401.10716

Zobrazit plný text záznamu

Report

Gated Linear Attention Transformers with Hardware-Efficient Training

Autor: Yang, Songlin, Wang, Bailin, Shen, Yikang, Panda, Rameswar, Kim, Yoon

Transformers with linear attention allow for efficient parallel training but can simultaneously be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear-time inference complexity. However, linear attention generally underpe

Externí odkaz: http://arxiv.org/abs/2312.06635

Zobrazit plný text záznamu

Report

Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations

Autor: Tang, Zilu, Agarwal, Mayank, Shypula, Alex, Wang, Bailin, Wijaya, Derry, Chen, Jie, Kim, Yoon

This work explores the use of self-generated natural language explanations as an intermediate step for code-to-code translation with language models. Across three types of explanations and 19 programming languages constructed from the MultiPL-E datas

Externí odkaz: http://arxiv.org/abs/2311.07070

Zobrazit plný text záznamu

Report

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

Autor: Qiu, Linlu, Jiang, Liwei, Lu, Ximing, Sclar, Melanie, Pyatkin, Valentina, Bhagavatula, Chandra, Wang, Bailin, Kim, Yoon, Choi, Yejin, Dziri, Nouha, Ren, Xiang

The ability to derive underlying principles from a handful of observations and then generalize to novel situations -- known as inductive reasoning -- is central to human intelligence. Prior work suggests that language models (LMs) often fall short on

Externí odkaz: http://arxiv.org/abs/2310.08559

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání