Minimally-Augmented Grammatical Error Correction
Autor: | Marcin Junczys-Dowmunt, Roman Grundkiewicz |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Training set
Computer science business.industry 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences language.human_language Grammatical error German Margin (machine learning) Simple (abstract algebra) 0202 electrical engineering electronic engineering information engineering language 020201 artificial intelligence & image processing Artificial intelligence business computer 0105 earth and related environmental sciences |
Zdroj: | Grundkiewicz, R & Junczys-Dowmuntz, M 2019, Minimally-Augmented Grammatical Error Correction . in Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019) . pp. 357–363, The 5th Workshop on Noisy User-generated Text (W-NUT): at EMNLP 2019, Hong Kong, 4/11/19 . https://doi.org/10.18653/v1/D19-5546 W-NUT@EMNLP |
DOI: | 10.18653/v1/D19-5546 |
Popis: | There has been an increased interest in low-resource approaches to automatic grammatical error correction. We introduce Minimally-Augmented Grammatical Error Correction (MAGEC) that does not require any error-labelled data. Our unsupervised approach is based on a simple but effective synthetic error generation method based on confusion sets from inverted spell-checkers. In low-resource settings, we outperform the current state-of-the-art results for German and Russian GEC tasks by a large margin without using any real error-annotated training data. When combined with labelled data, our method can serve as an efficient pre-training technique. |
Databáze: | OpenAIRE |
Externí odkaz: |