Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation
Autor: | Rajiv Movva, Jason Y. Zhao |
---|---|
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Machine Learning Lottery ticket Machine translation Computer science Machine Learning (stat.ML) 02 engineering and technology computer.software_genre Machine learning Machine Learning (cs.LG) 03 medical and health sciences 0302 clinical medicine Statistics - Machine Learning Behavioral study 0202 electrical engineering electronic engineering information engineering Semantic information Transformer (machine learning model) Computer Science - Computation and Language business.industry I.2.7 030221 ophthalmology & optometry 020201 artificial intelligence & image processing Artificial intelligence business Computation and Language (cs.CL) computer |
Zdroj: | BlackboxNLP@EMNLP |
Popis: | Recent work on the lottery ticket hypothesis has produced highly sparse Transformers for NMT while maintaining BLEU. However, it is unclear how such pruning techniques affect a model's learned representations. By probing Transformers with more and more low-magnitude weights pruned away, we find that complex semantic information is first to be degraded. Analysis of internal activations reveals that higher layers diverge most over the course of pruning, gradually becoming less complex than their dense counterparts. Meanwhile, early layers of sparse models begin to perform more encoding. Attention mechanisms remain remarkably consistent as sparsity increases. Comment: Camera-ready for BlackboxNLP @ EMNLP 2020 |
Databáze: | OpenAIRE |
Externí odkaz: |