A Convex-optimization-based Layer-wise Post-training Pruner for Large Language Models

Autor:	Zhao, Pengxiang, Hu, Hanyu, Li, Ping, Zheng, Yi, Wang, Zhefeng, Yuan, Xiaoming
Rok vydání:	2024
Předmět:	Computer Science - Machine Learning Mathematics - Optimization and Control
Druh dokumentu:	Working Paper
Popis:	Pruning is a critical strategy for compressing trained large language models (LLMs), aiming at substantial memory conservation and computational acceleration without compromising performance. However, existing pruning methods often necessitate inefficient retraining for billion-scale LLMs or rely on heuristic methods such as the optimal brain surgeon framework, which degrade performance. In this paper, we introduce FISTAPruner, the first post-training pruner based on convex optimization models and algorithms. Specifically, we propose a convex optimization model incorporating $\ell_1$ norm to induce sparsity and utilize the FISTA solver for optimization. FISTAPruner incorporates an intra-layer cumulative error correction mechanism and supports parallel pruning. We comprehensively evaluate FISTAPruner on models such as OPT, LLaMA, LLaMA-2, and LLaMA-3 with 125M to 70B parameters under unstructured and 2:4 semi-structured sparsity, demonstrating superior performance over existing state-of-the-art methods across various language benchmarks.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2408.03728 Zobrazit plný text záznamu View this record from Arxiv