Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Dangel, Felix"'
The Transformer architecture has inarguably revolutionized deep learning, overtaking classical architectures like multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs). At its core, the attention block differs in form and functional
Externí odkaz:
http://arxiv.org/abs/2410.10986
Second-order information is valuable for many applications but challenging to compute. Several works focus on computing or approximating Hessian diagonals, but even this simplification introduces significant additional costs compared to computing a g
Externí odkaz:
http://arxiv.org/abs/2406.03276
Physics-informed neural networks (PINNs) are infamous for being hard to train. Recently, second-order methods based on natural gradient and Gauss-Newton methods have shown promising performance, improving the accuracy achieved by first-order methods
Externí odkaz:
http://arxiv.org/abs/2405.15603
Autor:
Bhatia, Samarth, Dangel, Felix
Memory is a limiting resource for many deep learning tasks. Beside the neural network weights, one main memory consumer is the computation graph built up by automatic differentiation (AD) for backpropagation. We observe that PyTorch's current AD impl
Externí odkaz:
http://arxiv.org/abs/2404.12406
Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter up
Externí odkaz:
http://arxiv.org/abs/2402.03496
Autor:
Lin, Wu, Dangel, Felix, Eschenhagen, Runa, Neklyudov, Kirill, Kristiadi, Agustinus, Turner, Richard E., Makhzani, Alireza
Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or
Externí odkaz:
http://arxiv.org/abs/2312.05705
The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks. Kernel methods are theoretically well-understood and as a result enjoy algorithmic benefits, whi
Externí odkaz:
http://arxiv.org/abs/2310.00137
Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods
Autor:
Dangel, Felix
Publikováno v:
Advances in Neural Information Processing Systems (NeurIPS) 2024
Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the transfer of theoretical and algorithmic ideas to convolutions. We simplify convolutions by viewing them as tensor networks (TNs) that al
Externí odkaz:
http://arxiv.org/abs/2307.02275
Model reparametrization, which follows the change-of-variable rule of calculus, is a popular way to improve the training of neural nets. But it can also be problematic since it can induce inconsistencies in, e.g., Hessian-based flatness measures, opt
Externí odkaz:
http://arxiv.org/abs/2302.07384
Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks. Existing methods based on implicit multiplication vi
Externí odkaz:
http://arxiv.org/abs/2106.02624