Výsledky vyhledávání - "Kale, Satyen"

Report

Stacking as Accelerated Gradient Descent

Autor: Agarwal, Naman, Awasthi, Pranjal, Kale, Satyen, Zhao, Eric

Stacking, a heuristic technique for training deep residual networks by progressively increasing the number of layers and initializing new layers by copying parameters from older layers, has proven quite successful in improving the efficiency of train

Externí odkaz: http://arxiv.org/abs/2403.04978

Zobrazit plný text záznamu

Report

Efficient Stagewise Pretraining via Progressive Subnetworks

Autor: Panigrahi, Abhishek, Saunshi, Nikunj, Lyu, Kaifeng, Miryoosefi, Sobhan, Reddi, Sashank, Kale, Satyen, Kumar, Sanjiv

Recent developments in large language models have sparked interest in efficient pretraining methods. A recent effective paradigm is to perform stage-wise training, where the size of the model is gradually increased over the course of training (e.g. g

Externí odkaz: http://arxiv.org/abs/2402.05913

Zobrazit plný text záznamu

Report

Asynchronous Local-SGD Training for Language Modeling

Autor: Liu, Bo, Chhaparia, Rachita, Douillard, Arthur, Kale, Satyen, Rusu, Andrei A., Shen, Jiajun, Szlam, Arthur, Ranzato, Marc'Aurelio

Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication. This work presents an empirical study of {\it as

Externí odkaz: http://arxiv.org/abs/2401.09135

Zobrazit plný text záznamu

Report

Improved Differentially Private and Lazy Online Convex Optimization

Autor: Agarwal, Naman, Kale, Satyen, Singh, Karan, Thakurta, Abhradeep Guha

We study the task of $(\epsilon, \delta)$-differentially private online convex optimization (OCO). In the online setting, the release of each distinct decision or iterate carries with it the potential for privacy loss. This problem has a long history

Externí odkaz: http://arxiv.org/abs/2312.11534

Zobrazit plný text záznamu

Report

Almost Tight Bounds for Differentially Private Densest Subgraph

Autor: Dinitz, Michael, Kale, Satyen, Lattanzi, Silvio, Vassilvitskii, Sergei

We study the Densest Subgraph (DSG) problem under the additional constraint of differential privacy. DSG is a fundamental theoretical question which plays a central role in graph analytics, and so privacy is a natural requirement. All known private a

Externí odkaz: http://arxiv.org/abs/2308.10316

Zobrazit plný text záznamu

Report

On the Convergence of Federated Averaging with Cyclic Client Participation

Autor: Cho, Yae Jee, Sharma, Pranay, Joshi, Gauri, Xu, Zheng, Kale, Satyen, Zhang, Tong

Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either assume full client participation or partial client participation where the clients c

Externí odkaz: http://arxiv.org/abs/2302.03109

Zobrazit plný text záznamu

Report

From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent

Autor: Kale, Satyen, Lee, Jason D., De Sa, Chris, Sekhari, Ayush, Sridharan, Karthik

Stochastic Gradient Descent (SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the convergence of Gradi

Externí odkaz: http://arxiv.org/abs/2210.06705

Zobrazit plný text záznamu

Report

Private Matrix Approximation and Geometry of Unitary Orbits

Autor: Mangoubi, Oren, Wu, Yikai, Kale, Satyen, Thakurta, Abhradeep Guha, Vishnoi, Nisheeth K.

Publikováno v: Proceedings of Thirty Fifth Conference on Learning Theory (COLT), PMLR 178:3547-3588, 2022

Consider the following optimization problem: Given $n \times n$ matrices $A$ and $\Lambda$, maximize $\langle A, U\Lambda U^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$. This problem seeks to approximate $A$ by a matrix whose spe

Externí odkaz: http://arxiv.org/abs/2207.02794

Zobrazit plný text záznamu

Report

Beyond Uniform Lipschitz Condition in Differentially Private Optimization

Autor: Das, Rudrajit, Kale, Satyen, Xu, Zheng, Zhang, Tong, Sanghavi, Sujay

Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i.e., the per-sample gradients are uniformly bounded. We generalize uniform Lipschitzness by assum

Externí odkaz: http://arxiv.org/abs/2206.10713

Zobrazit plný text záznamu

Report

On the Unreasonable Effectiveness of Federated Averaging with Heterogeneous Data

Autor: Wang, Jianyu, Das, Rudrajit, Joshi, Gauri, Kale, Satyen, Xu, Zheng, Zhang, Tong

Existing theory predicts that data heterogeneity will degrade the performance of the Federated Averaging (FedAvg) algorithm in federated learning. However, in practice, the simple FedAvg algorithm converges very well. This paper explains the seemingl

Externí odkaz: http://arxiv.org/abs/2206.04723

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání