Výsledky vyhledávání

Report

Faster Algorithms for Average-Case Orthogonal Vectors and Closest Pair Problems

Autor: Alman, Josh, Andoni, Alexandr, Zhang, Hengjie

We study the average-case version of the Orthogonal Vectors problem, in which one is given as input $n$ vectors from $\{0,1\}^d$ which are chosen randomly so that each coordinate is $1$ independently with probability $p$. Kane and Williams [ITCS 2019

Externí odkaz: http://arxiv.org/abs/2410.22477

Zobrazit plný text záznamu

Report

Improving the Leading Constant of Matrix Multiplication

Autor: Alman, Josh, Yu, Hantao

Algebraic matrix multiplication algorithms are designed by bounding the rank of matrix multiplication tensors, and then using a recursive method. However, designing algorithms in this way quickly leads to large constant factors: if one proves that th

Externí odkaz: http://arxiv.org/abs/2410.20538

Zobrazit plný text záznamu

Report

Fundamental Limitations on Subquadratic Alternatives to Transformers

Autor: Alman, Josh, Yu, Hantao

The Transformer architecture is widely deployed in many popular and impactful Large Language Models. At its core is the attention mechanism for calculating correlations between pairs of tokens. Performing an attention computation takes quadratic time

Externí odkaz: http://arxiv.org/abs/2410.04271

Zobrazit plný text záznamu

Report

Finer-Grained Hardness of Kernel Density Estimation

Autor: Alman, Josh, Guan, Yunfeng

In batch Kernel Density Estimation (KDE) for a kernel function $f$, we are given as input $2n$ points $x^{(1)}, \cdots, x^{(n)}, y^{(1)}, \cdots, y^{(n)}$ in dimension $m$, as well as a vector $v \in \mathbb{R}^n$. These inputs implicitly define the

Externí odkaz: http://arxiv.org/abs/2407.02372

Zobrazit plný text záznamu

Report

More Asymmetry Yields Faster Matrix Multiplication

Autor: Alman, Josh, Duan, Ran, Williams, Virginia Vassilevska, Xu, Yinzhan, Xu, Zixuan, Zhou, Renfei

We present a new improvement on the laser method for designing fast matrix multiplication algorithms. The new method further develops the recent advances by [Duan, Wu, Zhou FOCS 2023] and [Vassilevska Williams, Xu, Xu, Zhou SODA 2024]. Surprisingly t

Externí odkaz: http://arxiv.org/abs/2404.16349

Zobrazit plný text záznamu

Report

The Fine-Grained Complexity of Gradient Computation for Training Large Language Models

Autor: Alman, Josh, Song, Zhao

Large language models (LLMs) have made fundamental contributions over the last a few years. To train an LLM, one needs to alternatingly run `forward' computations and `backward' computations. The forward computation can be viewed as attention functio

Externí odkaz: http://arxiv.org/abs/2402.04497

Zobrazit plný text záznamu

Report

Generalizations of Matrix Multiplication can solve the Light Bulb Problem

Autor: Alman, Josh, Zhang, Hengjie

In the light bulb problem, one is given uniformly random vectors $x_1, \ldots, x_n, y_1, \ldots, y_n \in \{-1,1\}^d$. They are all chosen independently except a planted pair $(x_{i^*}, y_{j^*})$ is chosen with correlation $\rho>0$. The goal is to fin

Externí odkaz: http://arxiv.org/abs/2311.01630

Zobrazit plný text záznamu

Report

How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation

Autor: Alman, Josh, Song, Zhao

In the classical transformer attention scheme, we are given three $n \times d$ size matrices $Q, K, V$ (the query, key, and value tokens), and the goal is to compute a new $n \times d$ size matrix $D^{-1} \exp(QK^\top) V$ where $D = \mathrm{diag}( \e

Externí odkaz: http://arxiv.org/abs/2310.04064

Zobrazit plný text záznamu

Report

Tensor Ranks and the Fine-Grained Complexity of Dynamic Programming

Autor: Alman, Josh, Turok, Ethan, Yu, Hantao, Zhang, Hengzhi

Generalizing work of K\"unnemann, Paturi, and Schneider [ICALP 2017], we study a wide class of high-dimensional dynamic programming (DP) problems in which one must find the shortest path between two points in a high-dimensional grid given a tensor of

Externí odkaz: http://arxiv.org/abs/2309.04683

Zobrazit plný text záznamu

Report

Fast Attention Requires Bounded Entries

Autor: Alman, Josh, Song, Zhao

In modern machine learning, inner product attention computation is a fundamental task for training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and ChatGPT. Formally, in this problem, one is given as input three matrices $Q, K

Externí odkaz: http://arxiv.org/abs/2302.13214

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání