Erroneous data generation for Grammatical Error Correction
Autor: | Shuyao Xu, Chen Jin, Long Qin, Jiehao Zhang |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
business.industry Test data generation 02 engineering and technology computer.software_genre Grammatical error 03 medical and health sciences 0302 clinical medicine 030221 ophthalmology & optometry 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer Natural language processing Transformer (machine learning model) |
Zdroj: | BEA@ACL |
DOI: | 10.18653/v1/w19-4415 |
Popis: | It has been demonstrated that the utilization of a monolingual corpus in neural Grammatical Error Correction (GEC) systems can significantly improve the system performance. The previous state-of-the-art neural GEC system is an ensemble of four Transformer models pretrained on a large amount of Wikipedia Edits. The Singsound GEC system follows a similar approach but is equipped with a sophisticated erroneous data generating component. Our system achieved an F0:5 of 66.61 in the BEA 2019 Shared Task: Grammatical Error Correction. With our novel erroneous data generating component, the Singsound neural GEC system yielded an M2 of 63.2 on the CoNLL-2014 benchmark (8.4% relative improvement over the previous state-of-the-art system). |
Databáze: | OpenAIRE |
Externí odkaz: |