Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation

Autor:	Rajiv Movva, Jason Y. Zhao
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences Computer Science - Machine Learning Lottery ticket Machine translation Computer science Machine Learning (stat.ML) 02 engineering and technology computer.software_genre Machine learning Machine Learning (cs.LG) 03 medical and health sciences 0302 clinical medicine Statistics - Machine Learning Behavioral study 0202 electrical engineering electronic engineering information engineering Semantic information Transformer (machine learning model) Computer Science - Computation and Language business.industry I.2.7 030221 ophthalmology & optometry 020201 artificial intelligence & image processing Artificial intelligence business Computation and Language (cs.CL) computer
Zdroj:	BlackboxNLP@EMNLP
Popis:	Recent work on the lottery ticket hypothesis has produced highly sparse Transformers for NMT while maintaining BLEU. However, it is unclear how such pruning techniques affect a model's learned representations. By probing Transformers with more and more low-magnitude weights pruned away, we find that complex semantic information is first to be degraded. Analysis of internal activations reveals that higher layers diverge most over the course of pruning, gradually becoming less complex than their dense counterparts. Meanwhile, early layers of sparse models begin to perform more encoding. Attention mechanisms remain remarkably consistent as sparsity increases. Comment: Camera-ready for BlackboxNLP @ EMNLP 2020
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2e8c7076f2dd3464beeb7ea89836d422 https://doi.org/10.18653/v1/2020.blackboxnlp-1.19 Zobrazit plný text záznamu