Zobrazeno 1 - 10
of 137
pro vyhledávání: '"Kale, Satyen"'
Stacking, a heuristic technique for training deep residual networks by progressively increasing the number of layers and initializing new layers by copying parameters from older layers, has proven quite successful in improving the efficiency of train
Externí odkaz:
http://arxiv.org/abs/2403.04978
Autor:
Panigrahi, Abhishek, Saunshi, Nikunj, Lyu, Kaifeng, Miryoosefi, Sobhan, Reddi, Sashank, Kale, Satyen, Kumar, Sanjiv
Recent developments in large language models have sparked interest in efficient pretraining methods. A recent effective paradigm is to perform stage-wise training, where the size of the model is gradually increased over the course of training (e.g. g
Externí odkaz:
http://arxiv.org/abs/2402.05913
Autor:
Liu, Bo, Chhaparia, Rachita, Douillard, Arthur, Kale, Satyen, Rusu, Andrei A., Shen, Jiajun, Szlam, Arthur, Ranzato, Marc'Aurelio
Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication. This work presents an empirical study of {\it as
Externí odkaz:
http://arxiv.org/abs/2401.09135
We study the task of $(\epsilon, \delta)$-differentially private online convex optimization (OCO). In the online setting, the release of each distinct decision or iterate carries with it the potential for privacy loss. This problem has a long history
Externí odkaz:
http://arxiv.org/abs/2312.11534
We study the Densest Subgraph (DSG) problem under the additional constraint of differential privacy. DSG is a fundamental theoretical question which plays a central role in graph analytics, and so privacy is a natural requirement. All known private a
Externí odkaz:
http://arxiv.org/abs/2308.10316
Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either assume full client participation or partial client participation where the clients c
Externí odkaz:
http://arxiv.org/abs/2302.03109
Stochastic Gradient Descent (SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the convergence of Gradi
Externí odkaz:
http://arxiv.org/abs/2210.06705
Publikováno v:
Proceedings of Thirty Fifth Conference on Learning Theory (COLT), PMLR 178:3547-3588, 2022
Consider the following optimization problem: Given $n \times n$ matrices $A$ and $\Lambda$, maximize $\langle A, U\Lambda U^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$. This problem seeks to approximate $A$ by a matrix whose spe
Externí odkaz:
http://arxiv.org/abs/2207.02794
Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i.e., the per-sample gradients are uniformly bounded. We generalize uniform Lipschitzness by assum
Externí odkaz:
http://arxiv.org/abs/2206.10713
Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm in federated learning. However, in practice, the simple FedAvg algorithm converges very well. This paper explains the seemingl
Externí odkaz:
http://arxiv.org/abs/2206.04723