Zobrazeno 1 - 10
of 223
pro vyhledávání: '"Goldfarb, Donald"'
Autor:
Bahamou, Achraf, Goldfarb, Donald
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization methods for minimizing empirical loss functions in deep learning, eliminating the need for the user to tune the learning rate (LR). The proposed approach
Externí odkaz:
http://arxiv.org/abs/2305.13664
Deep neural networks (DNNs) are currently predominantly trained using first-order methods. Some of these methods (e.g., Adam, AdaGrad, and RMSprop, and their variants) incorporate a small amount of curvature information by using a diagonal matrix to
Externí odkaz:
http://arxiv.org/abs/2202.04124
Autor:
Ren, Yi, Goldfarb, Donald
Despite the predominant use of first-order methods for training deep learning models, second-order methods, and in particular, natural gradient methods, remain of interest because of their potential for accelerating training through the use of curvat
Externí odkaz:
http://arxiv.org/abs/2106.02925
Second-order methods have the capability of accelerating optimization by using much richer curvature information than first-order methods. However, most are impractical for deep learning, where the number of training parameters is huge. In Goldfarb e
Externí odkaz:
http://arxiv.org/abs/2102.06737
We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs). In DNN training, the number of variables and components of the gr
Externí odkaz:
http://arxiv.org/abs/2006.08877
Autor:
Bahamou, Achraf, Goldfarb, Donald
We propose a stochastic optimization method for minimizing loss functions, expressed as an expected value, that adaptively controls the batch size used in the computation of gradient approximations and the step size used to move along such directions
Externí odkaz:
http://arxiv.org/abs/1912.13357
Autor:
Ren, Yi, Goldfarb, Donald
We present practical Levenberg-Marquardt variants of Gauss-Newton and natural gradient methods for solving non-convex optimization problems that arise in training deep neural networks involving enormous numbers of variables and huge data sets. Our me
Externí odkaz:
http://arxiv.org/abs/1906.02353
Autor:
Teng, Yunfei, Gao, Wenbo, Chalus, Francois, Choromanska, Anna, Goldfarb, Donald, Weller, Adrian
We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the curren
Externí odkaz:
http://arxiv.org/abs/1905.10395
Many problems in machine learning and game theory can be formulated as saddle-point problems, for which various first-order methods have been developed and proven efficient in practice. Under the general convex-concave assumption, most first-order me
Externí odkaz:
http://arxiv.org/abs/1903.10646
We expand the scope of the alternating direction method of multipliers (ADMM). Specifically, we show that ADMM, when employed to solve problems with multiaffine constraints that satisfy certain verifiable assumptions, converges to the set of constrai
Externí odkaz:
http://arxiv.org/abs/1802.09592