Výsledky vyhledávání - "Gusak, Julia"

Report

Quantization Aware Factorization for Deep Neural Network Compression

Autor: Cherniuk, Daria, Abukhovich, Stanislav, Phan, Anh-Huy, Oseledets, Ivan, Cichocki, Andrzej, Gusak, Julia

Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks. Due to memory and power consumption limitations of mobile or embedded devices, the quantization step is usually nec

Externí odkaz: http://arxiv.org/abs/2308.04595

Zobrazit plný text záznamu

Report

Rockmate: an Efficient, Fast, Automatic and Generic Tool for Re-materialization in PyTorch

Autor: Zhao, Xunyi, Hellard, Théotime Le, Eyraud, Lionel, Gusak, Julia, Beaumont, Olivier

We propose Rockmate to control the memory requirements when training PyTorch DNN models. Rockmate is an automatic tool that starts from the model code and generates an equivalent model, using a predefined amount of memory for activations, at the cost

Externí odkaz: http://arxiv.org/abs/2307.01236

Zobrazit plný text záznamu

Report

Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Autor: Chekalina, Viktoriia, Novikov, Georgii, Gusak, Julia, Oseledets, Ivan, Panchenko, Alexander

Large-scale transformer models have shown remarkable performance in language modelling tasks. However, such models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch. To reduce the

Externí odkaz: http://arxiv.org/abs/2306.02697

Zobrazit plný text záznamu

Report

Survey on Large Scale Neural Network Training

Autor: Gusak, Julia, Cherniuk, Daria, Shilova, Alena, Katrutsa, Alexander, Bershatsky, Daniel, Zhao, Xunyi, Eyraud-Dubois, Lionel, Shlyazhko, Oleg, Dimitrov, Denis, Oseledets, Ivan, Beaumont, Olivier

Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This sur

Externí odkaz: http://arxiv.org/abs/2202.10435

Zobrazit plný text záznamu

Report

Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Autor: Novikov, Georgii, Bershatsky, Daniel, Gusak, Julia, Shonenkov, Alex, Dimitrov, Denis, Oseledets, Ivan

Memory footprint is one of the main limiting factors for large neural network training. In backpropagation, one needs to store the input to each operation in the computational graph. Every modern neural network model has quite a few pointwise nonline

Externí odkaz: http://arxiv.org/abs/2202.00441

Zobrazit plný text záznamu

Report

Memory-Efficient Backpropagation through Large Linear Layers

Autor: Bershatsky, Daniel, Mikhalev, Aleksandr, Katrutsa, Alexandr, Gusak, Julia, Merkulov, Daniil, Oseledets, Ivan

In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the gradients of

Externí odkaz: http://arxiv.org/abs/2201.13195

Zobrazit plný text záznamu

Report

Meta-Solver for Neural Ordinary Differential Equations

Autor: Gusak, Julia, Katrutsa, Alexandr, Daulbaev, Talgat, Cichocki, Andrzej, Oseledets, Ivan

A conventional approach to train neural ordinary differential equations (ODEs) is to fix an ODE solver and then learn the neural network's weights to optimize a target loss function. However, such an approach is tailored for a specific discretization

Externí odkaz: http://arxiv.org/abs/2103.08561

Zobrazit plný text záznamu

Report

Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network

Autor: Phan, Anh-Huy, Sobolev, Konstantin, Sozykin, Konstantin, Ermilov, Dmitry, Gusak, Julia, Tichavsky, Petr, Glukhov, Valeriy, Oseledets, Ivan, Cichocki, Andrzej

Most state of the art deep neural networks are overparameterized and exhibit a high computational cost. A straightforward approach to this problem is to replace convolutional kernels with its low-rank tensor approximations, whereas the Canonical Poly

Externí odkaz: http://arxiv.org/abs/2008.05441

Zobrazit plný text záznamu

Report

Towards Understanding Normalization in Neural ODEs

Autor: Gusak, Julia, Markeeva, Larisa, Daulbaev, Talgat, Katrutsa, Alexandr, Cichocki, Andrzej, Oseledets, Ivan

Normalization is an important and vastly investigated technique in deep learning. However, its role for Ordinary Differential Equation based networks (neural ODEs) is still poorly understood. This paper investigates how different normalization techni

Externí odkaz: http://arxiv.org/abs/2004.09222

Zobrazit plný text záznamu

Report

Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs

Autor: Daulbaev, Talgat, Katrutsa, Alexandr, Markeeva, Larisa, Gusak, Julia, Cichocki, Andrzej, Oseledets, Ivan

We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as "adjoint method") to train neural ODEs on classification, dens

Externí odkaz: http://arxiv.org/abs/2003.05271

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání