Zobrazeno 1 - 10
of 54
pro vyhledávání: '"Woodworth, Blake"'
We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal point iterations on the proxy combined with relatively few sto
Externí odkaz:
http://arxiv.org/abs/2302.03542
The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees for the sam
Externí odkaz:
http://arxiv.org/abs/2206.07638
We consider potentially non-convex optimization problems, for which optimal rates of approximation depend on the dimension of the parameter space and the smoothness of the function to be optimized. In this paper, we propose an algorithm that achieves
Externí odkaz:
http://arxiv.org/abs/2204.04970
We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (pr
Externí odkaz:
http://arxiv.org/abs/2110.02954
Autor:
Woodworth, Blake
In this thesis, I study the minimax oracle complexity of distributed stochastic optimization. First, I present the "graph oracle model", an extension of the classic oracle complexity framework that can be applied to study distributed optimization alg
Externí odkaz:
http://arxiv.org/abs/2109.00534
Autor:
Wang, Jianyu, Charles, Zachary, Xu, Zheng, Joshi, Gauri, McMahan, H. Brendan, Arcas, Blaise Aguera y, Al-Shedivat, Maruan, Andrew, Galen, Avestimehr, Salman, Daly, Katharine, Data, Deepesh, Diggavi, Suhas, Eichner, Hubert, Gadhikar, Advait, Garrett, Zachary, Girgis, Antonious M., Hanzely, Filip, Hard, Andrew, He, Chaoyang, Horvath, Samuel, Huo, Zhouyuan, Ingerman, Alex, Jaggi, Martin, Javidi, Tara, Kairouz, Peter, Kale, Satyen, Karimireddy, Sai Praneeth, Konecny, Jakub, Koyejo, Sanmi, Li, Tian, Liu, Luyang, Mohri, Mehryar, Qi, Hang, Reddi, Sashank J., Richtarik, Peter, Singhal, Karan, Smith, Virginia, Soltanolkotabi, Mahdi, Song, Weikang, Suresh, Ananda Theertha, Stich, Sebastian U., Talwalkar, Ameet, Wang, Hongyi, Woodworth, Blake, Wu, Shanshan, Yu, Felix X., Yuan, Honglin, Zaheer, Manzil, Zhang, Mi, Zhang, Tong, Zheng, Chunxiang, Zhu, Chen, Zhu, Wennan
Federated learning and analytics are a distributed approach for collaboratively learning models (or statistics) from decentralized data, motivated by and designed for privacy protection. The distributed learning process can be formulated as solving f
Externí odkaz:
http://arxiv.org/abs/2107.06917
Autor:
Woodworth, Blake, Srebro, Nathan
We present and analyze an algorithm for optimizing smooth and convex or strongly convex objectives using minibatch stochastic gradient estimates. The algorithm is optimal with respect to its dependence on both the minibatch size and minimum expected
Externí odkaz:
http://arxiv.org/abs/2106.02720
Autor:
Azulay, Shahar, Moroshko, Edward, Nacson, Mor Shpigel, Woodworth, Blake, Srebro, Nathan, Globerson, Amir, Soudry, Daniel
Recent work has highlighted the role of initialization scale in determining the structure of the solutions that gradient methods converge to. In particular, it was shown that large initialization leads to the neural tangent kernel regime solution, wh
Externí odkaz:
http://arxiv.org/abs/2102.09769
The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication
We resolve the min-max complexity of distributed stochastic convex optimization (up to a log factor) in the intermittent communication setting, where $M$ machines work in parallel over the course of $R$ rounds of communication to optimize the objecti
Externí odkaz:
http://arxiv.org/abs/2102.01583
Autor:
Moroshko, Edward, Gunasekar, Suriya, Woodworth, Blake, Lee, Jason D., Srebro, Nathan, Soudry, Daniel
We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks". This is the simplest model displaying a transition between "kernel" and no
Externí odkaz:
http://arxiv.org/abs/2007.06738