Rethink the Effectiveness of Text Data Augmentation: An Empirical Analysis

Autor:	Shi, Zhengxiang, Lipani, Aldo
Rok vydání:	2023
Předmět:	Computer Science - Computation and Language Computer Science - Artificial Intelligence Computer Science - Machine Learning
Druh dokumentu:	Working Paper
Popis:	In recent years, language models (LMs) have made remarkable progress in advancing the field of natural language processing (NLP). However, the impact of data augmentation (DA) techniques on the fine-tuning (FT) performance of these LMs has been a topic of ongoing debate. In this study, we evaluate the effectiveness of three different FT methods in conjugation with back-translation across an array of 7 diverse NLP tasks, including classification and regression types, covering single-sentence and sentence-pair tasks. Contrary to prior assumptions that DA does not contribute to the enhancement of LMs' FT performance, our findings reveal that continued pre-training on augmented data can effectively improve the FT performance of the downstream tasks. In the most favourable case, continued pre-training improves the performance of FT by more than 10% in the few-shot learning setting. Our finding highlights the potential of DA as a powerful tool for bolstering LMs' performance. Comment: Accepted at ESANN 2023
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2306.07664 Zobrazit plný text záznamu View this record from Arxiv