Rank and run-time aware compression of NLP Applications
Autor: | Dibakar Gope, Urmish Thakker, Ganesh Dasika, Matthew Mattina, Jesse Beu |
---|---|
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Rank (linear algebra) Computer science Inference 010501 environmental sciences computer.software_genre Translation (geometry) 01 natural sciences Matrix decomposition Machine Learning (cs.LG) Matrix (mathematics) Compression (functional analysis) 0502 economics and business Pruning (decision trees) 050207 economics 0105 earth and related environmental sciences Computer Science - Computation and Language Computer Science - Performance business.industry 05 social sciences Performance (cs.PF) Language model Artificial intelligence business computer Computation and Language (cs.CL) Natural language processing |
DOI: | 10.48550/arxiv.2010.03193 |
Popis: | Sequence model based NLP applications can be large. Yet, many applications that benefit from them run on small devices with very limited compute and storage capabilities, while still having run-time constraints. As a result, there is a need for a compression technique that can achieve significant compression without negatively impacting inference run-time and task accuracy. This paper proposes a new compression technique called Hybrid Matrix Factorization that achieves this dual objective. HMF improves low-rank matrix factorization (LMF) techniques by doubling the rank of the matrix using an intelligent hybrid-structure leading to better accuracy than LMF. Further, by preserving dense matrices, it leads to faster inference run-time than pruning or structure matrix based compression technique. We evaluate the impact of this technique on 5 NLP benchmarks across multiple tasks (Translation, Intent Detection, Language Modeling) and show that for similar accuracy values and compression factors, HMF can achieve more than 2.32x faster inference run-time than pruning and 16.77% better accuracy than LMF. Comment: Published at SustaiNLP@EMNLP 2020. arXiv admin note: text overlap with arXiv:1906.04886 |
Databáze: | OpenAIRE |
Externí odkaz: |