Výsledky vyhledávání - "DANGEL, FELIX"

Report

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

Autor: Ormaniec, Weronika, Dangel, Felix, Singh, Sidak Pal

The Transformer architecture has inarguably revolutionized deep learning, overtaking classical architectures like multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs). At its core, the attention block differs in form and functional

Externí odkaz: http://arxiv.org/abs/2410.10986

Zobrazit plný text záznamu

Report

Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning

Autor: Elsayed, Mohamed, Farrahi, Homayoon, Dangel, Felix, Mahmood, A. Rupam

Second-order information is valuable for many applications but challenging to compute. Several works focus on computing or approximating Hessian diagonals, but even this simplification introduces significant additional costs compared to computing a g

Externí odkaz: http://arxiv.org/abs/2406.03276

Zobrazit plný text záznamu

Report

Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks

Autor: Dangel, Felix, Müller, Johannes, Zeinhofer, Marius

Publikováno v: Advances in Neural Information Processing Systems (NeurIPS) 2024

Physics-informed neural networks (PINNs) are infamous for being hard to train. Recently, second-order methods based on natural gradient and Gauss-Newton methods have shown promising performance, improving the accuracy achieved by first-order methods

Externí odkaz: http://arxiv.org/abs/2405.15603

Zobrazit plný text záznamu

Report

Lowering PyTorch's Memory Consumption for Selective Differentiation

Autor: Bhatia, Samarth, Dangel, Felix

Memory is a limiting resource for many deep learning tasks. Beside the neural network weights, one main memory consumer is the computation graph built up by automatic differentiation (AD) for backpropagation. We observe that PyTorch's current AD impl

Externí odkaz: http://arxiv.org/abs/2404.12406

Zobrazit plný text záznamu

Report

Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

Autor: Lin, Wu, Dangel, Felix, Eschenhagen, Runa, Bae, Juhan, Turner, Richard E., Makhzani, Alireza

Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers. Their diagonal preconditioner is based on the gradient outer product which is incorporated into the parameter up

Externí odkaz: http://arxiv.org/abs/2402.03496

Zobrazit plný text záznamu

Report

Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC

Autor: Lin, Wu, Dangel, Felix, Eschenhagen, Runa, Neklyudov, Kirill, Kristiadi, Agustinus, Turner, Richard E., Makhzani, Alireza

Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or

Externí odkaz: http://arxiv.org/abs/2312.05705

Zobrazit plný text záznamu

Report

On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective

Autor: Wenger, Jonathan, Dangel, Felix, Kristiadi, Agustinus

The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks. Kernel methods are theoretically well-understood and as a result enjoy algorithmic benefits, whi

Externí odkaz: http://arxiv.org/abs/2310.00137

Zobrazit plný text záznamu

Report

Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods

Autor: Dangel, Felix

Publikováno v: Advances in Neural Information Processing Systems (NeurIPS) 2024

Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the transfer of theoretical and algorithmic ideas to convolutions. We simplify convolutions by viewing them as tensor networks (TNs) that al

Externí odkaz: http://arxiv.org/abs/2307.02275

Zobrazit plný text záznamu

Report

The Geometry of Neural Nets' Parameter Spaces Under Reparametrization

Autor: Kristiadi, Agustinus, Dangel, Felix, Hennig, Philipp

Model reparametrization, which follows the change-of-variable rule of calculus, is a popular way to improve the training of neural nets. But it can also be problematic since it can induce inconsistencies in, e.g., Hessian-based flatness measures, opt

Externí odkaz: http://arxiv.org/abs/2302.07384

Zobrazit plný text záznamu

Report

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Autor: Dangel, Felix, Tatzel, Lukas, Hennig, Philipp

Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) approximation is valuable for algorithms that rely on a local model for the loss to train, compress, or explain deep networks. Existing methods based on implicit multiplication vi

Externí odkaz: http://arxiv.org/abs/2106.02624

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání