Zobrazeno 1 - 10
of 458
pro vyhledávání: '"Hensman, P."'
Autor:
Kang, Hao, Bharadwaj, Srikant, Hensman, James, Krishna, Tushar, Ruhle, Victor, Rajmohan, Saravan
Large language model (LLM) inference demands significant amount of computation and memory, especially in the key attention mechanism. While techniques, such as quantization and acceleration algorithms, like FlashAttention, have improved efficiency of
Externí odkaz:
http://arxiv.org/abs/2412.08585
Autor:
Scetbon, Meyer, Hensman, James
We consider the problem of model compression for Large Language Models (LLMs) at post-training time, where the task is to compress a well-trained model using only a small set of calibration input data. In this work, we introduce a new low-rank approa
Externí odkaz:
http://arxiv.org/abs/2412.07902
Recent works on compression of large language models (LLM) using quantization considered reparameterizing the architecture such that weights are distributed on the sphere. This demonstratively improves the ability to quantize by increasing the mathem
Externí odkaz:
http://arxiv.org/abs/2410.16926
In this paper, we propose Knowledge Base augmented Language Model (KBLaM), a new method for augmenting Large Language Models (LLMs) with external knowledge. KBLaM works with a knowledge base (KB) constructed from a corpus of documents, transforming e
Externí odkaz:
http://arxiv.org/abs/2410.10450
Autor:
Ashkboos, Saleh, Mohtashami, Amirkeivan, Croci, Maximilian L., Li, Bo, Cameron, Pashmina, Jaggi, Martin, Alistarh, Dan, Hoefler, Torsten, Hensman, James
We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without
Externí odkaz:
http://arxiv.org/abs/2404.00456
Autor:
Wu, Haolun, Yuan, Ye, Mikaelyan, Liana, Meulemans, Alexander, Liu, Xue, Hensman, James, Mitra, Bhaskar
Recent advances in machine learning have significantly impacted the field of information extraction, with Language Models (LMs) playing a pivotal role in extracting structured information from unstructured text. Prior works typically represent inform
Externí odkaz:
http://arxiv.org/abs/2402.04437
Autor:
Ashkboos, Saleh, Croci, Maximilian L., Nascimento, Marcelo Gennari do, Hoefler, Torsten, Hensman, James
Large language models have become the cornerstone of natural language processing, but their use comes with substantial costs in terms of compute and memory resources. Sparsification provides a solution to alleviate these resource constraints, and rec
Externí odkaz:
http://arxiv.org/abs/2401.15024
Structured (dictionary-like) data presents challenges for left-to-right language models, as they can struggle with structured entities for a wide variety of reasons such as formatting and sensitivity to the order in which attributes are presented. Ta
Externí odkaz:
http://arxiv.org/abs/2312.05253
Publikováno v:
Research Involvement and Engagement, Vol 10, Iss 1, Pp 1-11 (2024)
Abstract Background Public and patient involvement is critical to ensure that research is relevant and addresses what matters most to the person through co-production. Involvement at the design stage where ideas for research are developed prior to fo
Externí odkaz:
https://doaj.org/article/d012304f15034588b63d6ecd4755487d
Autor:
Randerson, Sam A., Zotev, Panaiot G., Hu, Xuerong, Knight, Alexander, Wang, Yadong, Nagarkar, Sharada, Hensman, Dominic, Wang, Yue, Tartakovskii, Alexander I.
Dielectric nanoresonators have been shown to circumvent the heavy optical losses associated with plasmonic devices, however they suffer from less confined resonances. By constructing a hybrid system of both dielectric and metallic materials, one can
Externí odkaz:
http://arxiv.org/abs/2304.02537