Zobrazeno 1 - 10
of 102
pro vyhledávání: '"Zhong, Peilin"'
Fine-tuning language models (LMs) with the Adam optimizer often demands excessive memory, limiting accessibility. The "in-place" version of Stochastic Gradient Descent (IP-SGD) and Memory-Efficient Zeroth-order Optimizer (MeZO) have been proposed to
Externí odkaz:
http://arxiv.org/abs/2410.06441
Autor:
Azarmehr, Amir, Behnezhad, Soheil, Jayaram, Rajesh, Łącki, Jakub, Mirrokni, Vahab, Zhong, Peilin
We study the minimum spanning tree (MST) problem in the massively parallel computation (MPC) model. Our focus is particularly on the *strictly sublinear* regime of MPC where the space per machine is $O(n^\delta)$. Here $n$ is the number of vertices a
Externí odkaz:
http://arxiv.org/abs/2408.06455
Autor:
Das, Rudrajit, Dhillon, Inderjit S., Epasto, Alessandro, Javanmard, Adel, Mao, Jieming, Mirrokni, Vahab, Sanghavi, Sujay, Zhong, Peilin
The performance of a model trained with \textit{noisy labels} is often improved by simply \textit{retraining} the model with its own predicted \textit{hard} labels (i.e., $1$/$0$ labels). Yet, a detailed theoretical characterization of this phenomeno
Externí odkaz:
http://arxiv.org/abs/2406.11206
We revisit the input perturbations framework for differential privacy where noise is added to the input $A\in \mathcal{S}$ and the result is then projected back to the space of admissible datasets $\mathcal{S}$. Through this framework, we first desig
Externí odkaz:
http://arxiv.org/abs/2406.04868
We study streaming algorithms for the $\ell_p$ subspace approximation problem. Given points $a_1, \ldots, a_n$ as an insertion-only stream and a rank parameter $k$, the $\ell_p$ subspace approximation problem is to find a $k$-dimensional subspace $V$
Externí odkaz:
http://arxiv.org/abs/2406.02910
In the coordinator model of communication with $s$ servers, given an arbitrary non-negative function $f$, we study the problem of approximating the sum $\sum_{i \in [n]}f(x_i)$ up to a $1 \pm \varepsilon$ factor. Here the vector $x \in R^n$ is define
Externí odkaz:
http://arxiv.org/abs/2403.20307
Autor:
Agarwal, Arpit, Khanna, Sanjeev, Li, Huan, Patil, Prathamesh, Wang, Chen, White, Nathan, Zhong, Peilin
We present a parallel algorithm for the $(1-\epsilon)$-approximate maximum flow problem in capacitated, undirected graphs with $n$ vertices and $m$ edges, achieving $O(\epsilon^{-3}\text{polylog} n)$ depth and $O(m \epsilon^{-3} \text{polylog} n)$ wo
Externí odkaz:
http://arxiv.org/abs/2402.14950
Clustering is an important technique for identifying structural information in large-scale data analysis, where the underlying dataset may be too large to store. In many applications, recent data can provide more accurate information and thus older d
Externí odkaz:
http://arxiv.org/abs/2311.00642
The quadratic time and memory complexity inherent to self-attention mechanisms, with respect to sequence length, presents a critical computational bottleneck in the training and deployment of large-scale Transformer-based language models. Recent theo
Externí odkaz:
http://arxiv.org/abs/2310.01655
We study the classic Euclidean Minimum Spanning Tree (MST) problem in the Massively Parallel Computation (MPC) model. Given a set $X \subset \mathbb{R}^d$ of $n$ points, the goal is to produce a spanning tree for $X$ with weight within a small factor
Externí odkaz:
http://arxiv.org/abs/2308.00503