Výsledky vyhledávání

Report

QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead

Autor: Zandieh, Amir, Daliri, Majid, Han, Insu

Serving LLMs requires substantial memory due to the storage requirements of Key-Value (KV) embeddings in the KV cache, which grows with sequence length. An effective approach to compress KV cache is quantization. However, traditional quantization met

Externí odkaz: http://arxiv.org/abs/2406.03482

Zobrazit plný text záznamu

Report

SubGen: Token Generation in Sublinear Time and Memory

Autor: Zandieh, Amir, Han, Insu, Mirrokni, Vahab, Karbasi, Amin

Despite the significant success of large language models (LLMs), their extensive memory requirements pose challenges for deploying them in long-context token generation. The substantial memory footprint of LLM decoders arises from the necessity to st

Externí odkaz: http://arxiv.org/abs/2402.06082

Zobrazit plný text záznamu

Report

HyperAttention: Long-context Attention in Near-Linear Time

Autor: Han, Insu, Jayaram, Rajesh, Karbasi, Amin, Mirrokni, Vahab, Woodruff, David P., Zandieh, Amir

We present an approximate attention mechanism named HyperAttention to address the computational challenges posed by the growing complexity of long contexts used in Large Language Models (LLMs). Recent work suggests that in the worst-case scenario, qu

Externí odkaz: http://arxiv.org/abs/2310.05869

Zobrazit plný text záznamu

Report

KDEformer: Accelerating Transformers via Kernel Density Estimation

Autor: Zandieh, Amir, Han, Insu, Daliri, Majid, Karbasi, Amin

Dot-product attention mechanism plays a crucial role in modern deep architectures (e.g., Transformer) for sequence modeling, however, na\"ive exact computation of this model incurs quadratic time and memory complexities in sequence length, hindering

Externí odkaz: http://arxiv.org/abs/2302.02451

Zobrazit plný text záznamu

Report

Fast Neural Kernel Embeddings for General Activations

Autor: Han, Insu, Zandieh, Amir, Lee, Jaehoon, Novak, Roman, Xiao, Lechao, Karbasi, Amin

Infinite width limit has shed light on generalization and optimization aspects of deep learning by establishing connections between neural networks and kernel methods. Despite their importance, the utility of these kernel methods was limited in large

Externí odkaz: http://arxiv.org/abs/2209.04121

Zobrazit plný text záznamu

Report

Scalable MCMC Sampling for Nonsymmetric Determinantal Point Processes

Autor: Han, Insu, Gartrell, Mike, Dohmatob, Elvis, Karbasi, Amin

A determinantal point process (DPP) is an elegant model that assigns a probability to every subset of a collection of $n$ items. While conventionally a DPP is parameterized by a symmetric kernel matrix, removing this symmetry constraint, resulting in

Externí odkaz: http://arxiv.org/abs/2207.00486

Zobrazit plný text záznamu

Report

Near Optimal Reconstruction of Spherical Harmonic Expansions

Autor: Zandieh, Amir, Han, Insu, Avron, Haim

We propose an algorithm for robust recovery of the spherical harmonic expansion of functions defined on the d-dimensional unit sphere $\mathbb{S}^{d-1}$ using a near-optimal number of function evaluations. We show that for any $f \in L^2(\mathbb{S}^{

Externí odkaz: http://arxiv.org/abs/2202.12995

Zobrazit plný text záznamu

Report

Random Gegenbauer Features for Scalable Kernel Methods

Autor: Han, Insu, Zandieh, Amir, Avron, Haim

We propose efficient random features for approximating a new and rich class of kernel functions that we refer to as Generalized Zonal Kernels (GZK). Our proposed GZK family, generalizes the zonal kernels (i.e., dot-product kernels on the unit sphere)

Externí odkaz: http://arxiv.org/abs/2202.03474

Zobrazit plný text záznamu

Report

Scalable Sampling for Nonsymmetric Determinantal Point Processes

Autor: Han, Insu, Gartrell, Mike, Gillenwater, Jennifer, Dohmatob, Elvis, Karbasi, Amin

A determinantal point process (DPP) on a collection of $M$ items is a model, parameterized by a symmetric kernel matrix, that assigns a probability to every subset of those items. Recent work shows that removing the kernel symmetry constraint, yieldi

Externí odkaz: http://arxiv.org/abs/2201.08417

Zobrazit plný text záznamu

Report

Scaling Neural Tangent Kernels via Sketching and Random Features

Autor: Zandieh, Amir, Han, Insu, Avron, Haim, Shoham, Neta, Kim, Chaewon, Shin, Jinwoo

The Neural Tangent Kernel (NTK) characterizes the behavior of infinitely-wide neural networks trained under least squares loss by gradient descent. Recent works also report that NTK regression can outperform finitely-wide neural networks trained on s

Externí odkaz: http://arxiv.org/abs/2106.07880

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání