Zobrazeno 1 - 10
of 56
pro vyhledávání: '"Alman, Josh"'
We study the average-case version of the Orthogonal Vectors problem, in which one is given as input $n$ vectors from $\{0,1\}^d$ which are chosen randomly so that each coordinate is $1$ independently with probability $p$. Kane and Williams [ITCS 2019
Externí odkaz:
http://arxiv.org/abs/2410.22477
Autor:
Alman, Josh, Yu, Hantao
Algebraic matrix multiplication algorithms are designed by bounding the rank of matrix multiplication tensors, and then using a recursive method. However, designing algorithms in this way quickly leads to large constant factors: if one proves that th
Externí odkaz:
http://arxiv.org/abs/2410.20538
Autor:
Alman, Josh, Yu, Hantao
The Transformer architecture is widely deployed in many popular and impactful Large Language Models. At its core is the attention mechanism for calculating correlations between pairs of tokens. Performing an attention computation takes quadratic time
Externí odkaz:
http://arxiv.org/abs/2410.04271
Autor:
Alman, Josh, Guan, Yunfeng
In batch Kernel Density Estimation (KDE) for a kernel function $f$, we are given as input $2n$ points $x^{(1)}, \cdots, x^{(n)}, y^{(1)}, \cdots, y^{(n)}$ in dimension $m$, as well as a vector $v \in \mathbb{R}^n$. These inputs implicitly define the
Externí odkaz:
http://arxiv.org/abs/2407.02372
Autor:
Alman, Josh, Duan, Ran, Williams, Virginia Vassilevska, Xu, Yinzhan, Xu, Zixuan, Zhou, Renfei
We present a new improvement on the laser method for designing fast matrix multiplication algorithms. The new method further develops the recent advances by [Duan, Wu, Zhou FOCS 2023] and [Vassilevska Williams, Xu, Xu, Zhou SODA 2024]. Surprisingly t
Externí odkaz:
http://arxiv.org/abs/2404.16349
Autor:
Alman, Josh, Song, Zhao
Large language models (LLMs) have made fundamental contributions over the last a few years. To train an LLM, one needs to alternatingly run `forward' computations and `backward' computations. The forward computation can be viewed as attention functio
Externí odkaz:
http://arxiv.org/abs/2402.04497
Autor:
Alman, Josh, Zhang, Hengjie
In the light bulb problem, one is given uniformly random vectors $x_1, \ldots, x_n, y_1, \ldots, y_n \in \{-1,1\}^d$. They are all chosen independently except a planted pair $(x_{i^*}, y_{j^*})$ is chosen with correlation $\rho>0$. The goal is to fin
Externí odkaz:
http://arxiv.org/abs/2311.01630
Autor:
Alman, Josh, Song, Zhao
In the classical transformer attention scheme, we are given three $n \times d$ size matrices $Q, K, V$ (the query, key, and value tokens), and the goal is to compute a new $n \times d$ size matrix $D^{-1} \exp(QK^\top) V$ where $D = \mathrm{diag}( \e
Externí odkaz:
http://arxiv.org/abs/2310.04064
Generalizing work of K\"unnemann, Paturi, and Schneider [ICALP 2017], we study a wide class of high-dimensional dynamic programming (DP) problems in which one must find the shortest path between two points in a high-dimensional grid given a tensor of
Externí odkaz:
http://arxiv.org/abs/2309.04683
Autor:
Alman, Josh, Song, Zhao
In modern machine learning, inner product attention computation is a fundamental task for training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and ChatGPT. Formally, in this problem, one is given as input three matrices $Q, K
Externí odkaz:
http://arxiv.org/abs/2302.13214