MixUp Training Leads to Reduced Overfitting and Improved Calibration for the Transformer Architecture

Autor:	Zhang, Wancong, Vaidya, Ieshan
Rok vydání:	2021
Předmět:	Computer Science - Computation and Language Computer Science - Machine Learning
Druh dokumentu:	Working Paper
Popis:	MixUp is a computer vision data augmentation technique that uses convex interpolations of input data and their labels to enhance model generalization during training. However, the application of MixUp to the natural language understanding (NLU) domain has been limited, due to the difficulty of interpolating text directly in the input space. In this study, we propose MixUp methods at the Input, Manifold, and sentence embedding levels for the transformer architecture, and apply them to finetune the BERT model for a diverse set of NLU tasks. We find that MixUp can improve model performance, as well as reduce test loss and model calibration error by up to 50%.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2102.11402 Zobrazit plný text záznamu View this record from Arxiv