Erroneous data generation for Grammatical Error Correction

Autor: Shuyao Xu, Chen Jin, Long Qin, Jiehao Zhang
Rok vydání: 2019
Předmět:
Zdroj: BEA@ACL
DOI: 10.18653/v1/w19-4415
Popis: It has been demonstrated that the utilization of a monolingual corpus in neural Grammatical Error Correction (GEC) systems can significantly improve the system performance. The previous state-of-the-art neural GEC system is an ensemble of four Transformer models pretrained on a large amount of Wikipedia Edits. The Singsound GEC system follows a similar approach but is equipped with a sophisticated erroneous data generating component. Our system achieved an F0:5 of 66.61 in the BEA 2019 Shared Task: Grammatical Error Correction. With our novel erroneous data generating component, the Singsound neural GEC system yielded an M2 of 63.2 on the CoNLL-2014 benchmark (8.4% relative improvement over the previous state-of-the-art system).
Databáze: OpenAIRE