Výsledky vyhledávání

Report

A Statistical Framework for Data-dependent Retrieval-Augmented Models

Autor: Basu, Soumya, Rawat, Ankit Singh, Zaheer, Manzil

Modern ML systems increasingly augment input instances with additional relevant information to enhance final prediction. Despite growing interest in such retrieval-augmented models, their fundamental properties and training are not well understood. W

Externí odkaz: http://arxiv.org/abs/2408.15399

Zobrazit plný text záznamu

Report

Analysis of Plan-based Retrieval for Grounded Text Generation

Autor: Godbole, Ameya, Monath, Nicholas, Kim, Seungyeon, Rawat, Ankit Singh, McCallum, Andrew, Zaheer, Manzil

In text generation, hallucinations refer to the generation of seemingly coherent text that contradicts established knowledge. One compelling hypothesis is that hallucinations occur when a language model is given a generation task outside its parametr

Externí odkaz: http://arxiv.org/abs/2408.10490

Zobrazit plný text záznamu

Report

Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond

Autor: Li, Yingcong, Rawat, Ankit Singh, Oymak, Samet

Recent research has shown that Transformers with linear attention are capable of in-context learning (ICL) by implementing a linear estimator through gradient descent steps. However, the existing results on the optimization landscape apply under styl

Externí odkaz: http://arxiv.org/abs/2407.10005

Zobrazit plný text záznamu

Report

Efficient Document Ranking with Learnable Late Interactions

Autor: Ji, Ziwei, Jain, Himanshu, Veit, Andreas, Reddi, Sashank J., Jayasumana, Sadeep, Rawat, Ankit Singh, Menon, Aditya Krishna, Yu, Felix, Kumar, Sanjiv

Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and d

Externí odkaz: http://arxiv.org/abs/2406.17968

Zobrazit plný text záznamu

Report

Cascade-Aware Training of Language Models

Autor: Wang, Congchao, Augenstein, Sean, Rush, Keith, Jitkrittum, Wittawat, Narasimhan, Harikrishna, Rawat, Ankit Singh, Menon, Aditya Krishna, Go, Alec

Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for simpler queries

Externí odkaz: http://arxiv.org/abs/2406.00060

Zobrazit plný text záznamu

Report

Faster Cascades via Speculative Decoding

Autor: Narasimhan, Harikrishna, Jitkrittum, Wittawat, Rawat, Ankit Singh, Kim, Seungyeon, Gupta, Neha, Menon, Aditya Krishna, Kumar, Sanjiv

Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule

Externí odkaz: http://arxiv.org/abs/2405.19261

Zobrazit plný text záznamu

Report

Language Model Cascades: Token-level uncertainty and beyond

Autor: Gupta, Neha, Narasimhan, Harikrishna, Jitkrittum, Wittawat, Rawat, Ankit Singh, Menon, Aditya Krishna, Kumar, Sanjiv

Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here,

Externí odkaz: http://arxiv.org/abs/2404.10136

Zobrazit plný text záznamu

Report

Mechanics of Next Token Prediction with Self-Attention

Autor: Li, Yingcong, Huang, Yixiao, Ildiz, M. Emrullah, Rawat, Ankit Singh, Oymak, Samet

Transformer-based language models are trained on large datasets to predict the next token given an input sequence. Despite this simple training objective, they have led to revolutionary advances in natural language processing. Underlying this success

Externí odkaz: http://arxiv.org/abs/2403.08081

Zobrazit plný text záznamu

Report

From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

Autor: Ildiz, M. Emrullah, Huang, Yixiao, Li, Yingcong, Rawat, Ankit Singh, Oymak, Samet

Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation. In this work, we study learning a 1-layer self-attention model from a set of prompts and associated output data

Externí odkaz: http://arxiv.org/abs/2402.13512

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání