Výsledky vyhledávání - "Rush, Alexander M"

Report

Autor: Yin, Junjie Oscar, Rush, Alexander M.

Data selection can reduce the amount of training data needed to finetune LLMs; however, the efficacy of data selection scales directly with its compute. Motivated by the practical challenge of compute-constrained finetuning, we consider the setting i

Externí odkaz: http://arxiv.org/abs/2410.16208

Zobrazit plný text záznamu

Report

Contextual Document Embeddings

Autor: Morris, John X., Rush, Alexander M.

Dense document embeddings are central to neural retrieval. The dominant paradigm is to train and construct embeddings by running encoders directly on individual documents. In this work, we argue that these embeddings, while effective, are implicitly

Externí odkaz: http://arxiv.org/abs/2410.02525

Zobrazit plný text záznamu

Report

A Controlled Study on Long Context Extension and Generalization in LLMs

Autor: Lu, Yi, Yan, Jing Nathan, Yang, Songlin, Chiu, Justin T., Ren, Siyu, Yuan, Fei, Zhao, Wenting, Wu, Zhiyong, Rush, Alexander M.

Broad textual understanding and in-context learning require language models that utilize full document contexts. Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending

Externí odkaz: http://arxiv.org/abs/2409.12181

Zobrazit plný text záznamu

Report

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Autor: Wang, Junxiong, Paliotta, Daniele, May, Avner, Rush, Alexander M., Dao, Tri

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the challenge of conve

Externí odkaz: http://arxiv.org/abs/2408.15237

Zobrazit plný text záznamu

Report

Great Memory, Shallow Reasoning: Limits of $k$NN-LMs

Autor: Geng, Shangyi, Zhao, Wenting, Rush, Alexander M

$K$-nearest neighbor language models ($k$NN-LMs), which integrate retrieval with next-word prediction, have demonstrated strong performance in language modeling as well as downstream NLP benchmarks. These results have led researchers to argue that mo

Externí odkaz: http://arxiv.org/abs/2408.11815

Zobrazit plný text záznamu

Report

I Could've Asked That: Reformulating Unanswerable Questions

Autor: Zhao, Wenting, Gao, Ge, Cardie, Claire, Rush, Alexander M.

When seeking information from unfamiliar documents, users frequently pose questions that cannot be answered by the documents. While existing large language models (LLMs) identify these unanswerable questions, they do not assist users in reformulating

Externí odkaz: http://arxiv.org/abs/2407.17469

Zobrazit plný text záznamu

Report

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

Autor: Akhauri, Yash, AbouElhamayed, Ahmed F, Dotzel, Jordan, Zhang, Zhiru, Rush, Alexander M, Huda, Safeen, Abdelfattah, Mohamed S

The high power consumption and latency-sensitive deployments of large language models (LLMs) have motivated efficiency techniques like quantization and sparsity. Contextual sparsity, where the sparsity pattern is input-dependent, is crucial in LLMs b

Externí odkaz: http://arxiv.org/abs/2406.16635

Zobrazit plný text záznamu

Report

Entity Disambiguation via Fusion Entity Decoding

Autor: Wang, Junxiong, Mousavi, Ali, Attia, Omar, Pradeep, Ronak, Potdar, Saloni, Rush, Alexander M., Minhas, Umar Farooq, Li, Yunyao

Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to cla

Externí odkaz: http://arxiv.org/abs/2404.01626

Zobrazit plný text záznamu

Report

MambaByte: Token-free Selective State Space Model

Autor: Wang, Junxiong, Gangavarapu, Tushaar, Yan, Jing Nathan, Rush, Alexander M.

Token-free language models learn directly from raw bytes and remove the inductive bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences. In this setting, standard autoregressive Transformers scale poorly

Externí odkaz: http://arxiv.org/abs/2401.13660

Zobrazit plný text záznamu

Report

Language Model Inversion

Autor: Morris, John X., Zhao, Wenting, Chiu, Justin T., Shmatikov, Vitaly, Rush, Alexander M.

Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of informatio

Externí odkaz: http://arxiv.org/abs/2311.13647

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání