Zobrazeno 1 - 10
of 1 276
pro vyhledávání: '"ANKIT SINGH"'
Autor:
Rawat, Ankit Singh, Sadhanala, Veeranjaneyulu, Rostamizadeh, Afshin, Chakrabarti, Ayan, Jitkrittum, Wittawat, Feinberg, Vladimir, Kim, Seungyeon, Harutyunyan, Hrayr, Saunshi, Nikunj, Nado, Zachary, Shivanna, Rakesh, Reddi, Sashank J., Menon, Aditya Krishna, Anil, Rohan, Kumar, Sanjiv
A primary challenge in large language model (LLM) development is their onerous pre-training cost. Typically, such pre-training involves optimizing a self-supervised objective (such as next-token prediction) over a large corpus. This paper explores a
Externí odkaz:
http://arxiv.org/abs/2410.18779
Modern ML systems increasingly augment input instances with additional relevant information to enhance final prediction. Despite growing interest in such retrieval-augmented models, their fundamental properties and training are not well understood. W
Externí odkaz:
http://arxiv.org/abs/2408.15399
Autor:
Godbole, Ameya, Monath, Nicholas, Kim, Seungyeon, Rawat, Ankit Singh, McCallum, Andrew, Zaheer, Manzil
In text generation, hallucinations refer to the generation of seemingly coherent text that contradicts established knowledge. One compelling hypothesis is that hallucinations occur when a language model is given a generation task outside its parametr
Externí odkaz:
http://arxiv.org/abs/2408.10490
Recent research has shown that Transformers with linear attention are capable of in-context learning (ICL) by implementing a linear estimator through gradient descent steps. However, the existing results on the optimization landscape apply under styl
Externí odkaz:
http://arxiv.org/abs/2407.10005
Autor:
Ji, Ziwei, Jain, Himanshu, Veit, Andreas, Reddi, Sashank J., Jayasumana, Sadeep, Rawat, Ankit Singh, Menon, Aditya Krishna, Yu, Felix, Kumar, Sanjiv
Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and d
Externí odkaz:
http://arxiv.org/abs/2406.17968
Autor:
Wang, Congchao, Augenstein, Sean, Rush, Keith, Jitkrittum, Wittawat, Narasimhan, Harikrishna, Rawat, Ankit Singh, Menon, Aditya Krishna, Go, Alec
Reducing serving cost and latency is a fundamental concern for the deployment of language models (LMs) in business applications. To address this, cascades of LMs offer an effective solution that conditionally employ smaller models for simpler queries
Externí odkaz:
http://arxiv.org/abs/2406.00060
Autor:
Narasimhan, Harikrishna, Jitkrittum, Wittawat, Rawat, Ankit Singh, Kim, Seungyeon, Gupta, Neha, Menon, Aditya Krishna, Kumar, Sanjiv
Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule
Externí odkaz:
http://arxiv.org/abs/2405.19261
Autor:
Gupta, Neha, Narasimhan, Harikrishna, Jitkrittum, Wittawat, Rawat, Ankit Singh, Menon, Aditya Krishna, Kumar, Sanjiv
Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here,
Externí odkaz:
http://arxiv.org/abs/2404.10136
Transformer-based language models are trained on large datasets to predict the next token given an input sequence. Despite this simple training objective, they have led to revolutionary advances in natural language processing. Underlying this success
Externí odkaz:
http://arxiv.org/abs/2403.08081
Modern language models rely on the transformer architecture and attention mechanism to perform language understanding and text generation. In this work, we study learning a 1-layer self-attention model from a set of prompts and associated output data
Externí odkaz:
http://arxiv.org/abs/2402.13512