Minimally-Augmented Grammatical Error Correction

Autor: Marcin Junczys-Dowmunt, Roman Grundkiewicz
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: Grundkiewicz, R & Junczys-Dowmuntz, M 2019, Minimally-Augmented Grammatical Error Correction . in Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019) . pp. 357–363, The 5th Workshop on Noisy User-generated Text (W-NUT): at EMNLP 2019, Hong Kong, 4/11/19 . https://doi.org/10.18653/v1/D19-5546
W-NUT@EMNLP
DOI: 10.18653/v1/D19-5546
Popis: There has been an increased interest in low-resource approaches to automatic grammatical error correction. We introduce Minimally-Augmented Grammatical Error Correction (MAGEC) that does not require any error-labelled data. Our unsupervised approach is based on a simple but effective synthetic error generation method based on confusion sets from inverted spell-checkers. In low-resource settings, we outperform the current state-of-the-art results for German and Russian GEC tasks by a large margin without using any real error-annotated training data. When combined with labelled data, our method can serve as an efficient pre-training technique.
Databáze: OpenAIRE