Grammar Correction for Multiple Errors in Chinese Based on Prompt Templates

Autor:	Zhici Wang, Qiancheng Yu, Jinyun Wang, Zhiyong Hu, Aoqiang Wang
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Chinese grammar error correction prompt templates pretrained models bidirectional long short-term memory network conditional random fields confusion set Technology Engineering (General). Civil engineering (General) TA1-2040 Biology (General) QH301-705.5 Physics QC1-999 Chemistry QD1-999
Zdroj:	Applied Sciences, Vol 13, Iss 15, p 8858 (2023)
Druh dokumentu:	article
ISSN:	2076-3417
DOI:	10.3390/app13158858
Popis:	Grammar error correction (GEC) is a crucial task in the field of Natural Language Processing (NLP). Its objective is to automatically detect and rectify grammatical mistakes in sentences, which possesses immense application research value. Currently, mainstream grammar-correction methods primarily rely on sequence labeling and text generation, which are two kinds of end-to-end methods. These methods have shown exemplary performance in areas with low error density but often fail to deliver satisfactory results in high-error density situations where multiple errors exist in a single sentence. Consequently, these methods tend to overcorrect correct words, leading to a high rate of false positives. To address this issue, we researched the specific characteristics of the Chinese grammar error correction (CGEC) task in high-error density situations. We proposed a grammar-correction method based on prompt templates. Firstly, we proposed a strategy for constructing prompt templates suitable for CGEC. This strategy transforms the CGEC task into a masked fill-in-the-blank task compatible with the masked language model BERT. Secondly, we proposed a method for dynamically updating templates, which incorporates already corrected errors into the template through dynamic updates to improve the template quality. Moreover, we used the phonetic and graphical resemblance knowledge from the confusion set as guiding information. By combining this with BERT’s prediction results, the model can more accurately select the correct characters, significantly enhancing the accuracy of the model’s prediction correction results. Our methods were validated through experiments on a public grammar-correction dataset. The results indicate that our method achieves higher correction performance and lower false correction rates in high-error density scenarios.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/6f799dafaf6349188804141441b60666 Zobrazit plný text záznamu View record in DOAJ