Binarized Neural Machine Translation

Autor:	Zhang, Yichi, Garg, Ankush, Cao, Yuan, Lew, Łukasz, Ghorbani, Behrooz, Zhang, Zhiru, Firat, Orhan
Rok vydání:	2023
Předmět:	Computer Science - Computation and Language Computer Science - Machine Learning
Zdroj:	Published at NeurIPS 2023
Druh dokumentu:	Working Paper
Popis:	The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind. We identify and address the problem of inflated dot-product variance when using one-bit weights and activations. Specifically, BMT leverages additional LayerNorms and residual connections to improve binarization quality. Experiments on the WMT dataset show that a one-bit weight-only Transformer can achieve the same quality as a float one, while being 16x smaller in size. One-bit activations incur varying degrees of quality drop, but mitigated by the proposed architectural changes. We further conduct a scaling law study using production-scale translation datasets, which shows that one-bit weight Transformers scale and generalize well in both in-domain and out-of-domain settings. Implementation in JAX/Flax will be open sourced.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2302.04907 Zobrazit plný text záznamu View this record from Arxiv