SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining

Autor:	Han, Andi, Li, Jiaxiang, Huang, Wei, Hong, Mingyi, Takeda, Akiko, Jawanpuria, Pratik, Mishra, Bamdev
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning
Druh dokumentu:	Working Paper
Popis:	Large language models (LLMs) have shown impressive capabilities across various tasks. However, training LLMs from scratch requires significant computational power and extensive memory capacity. Recent studies have explored low-rank structures on weights for efficient fine-tuning in terms of parameters and memory, either through low-rank adaptation or factorization. While effective for fine-tuning, low-rank structures are generally less suitable for pretraining because they restrict parameters to a low-dimensional subspace. In this work, we propose to parameterize the weights as a sum of low-rank and sparse matrices for pretraining, which we call SLTrain. The low-rank component is learned via matrix factorization, while for the sparse component, we employ a simple strategy of uniformly selecting the sparsity support at random and learning only the non-zero entries with the fixed support. While being simple, the random fixed-support sparse learning strategy significantly enhances pretraining when combined with low-rank learning. Our results show that SLTrain adds minimal extra parameters and memory costs compared to pretraining with low-rank parameterization, yet achieves substantially better performance, which is comparable to full-rank training. Remarkably, when combined with quantization and per-layer updates, SLTrain can reduce memory requirements by up to 73% when pretraining the LLaMA 7B model.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2406.02214 Zobrazit plný text záznamu View this record from Arxiv