Automatic Building of a Large Arabic Spelling Error Corpus

Autor: Aichaoui, Shaimaa Ben, Hiri, Nawel, Dahou, Abdelhalim Hafedh, Cheragui, Mohamed Amine
Zdroj: SN Computer Science; March 2023, Vol. 4 Issue: 2
Abstrakt: Today, for spelling Checker, a classical topic in natural language processing, the corpus has become an important component in the development process, especially with the emergence of stochastic and machine learning approaches that exploit corpus to build resolution models. The aim of our work is based on two phases: the first one is to build a corpus dedicated to the detection and correction of spelling errors in Arabic texts that we call SPIRAL and the second phase is to see the impact of our corpus through an experimental study using a deep learning model which is AraBART. The results obtained using the F1 metric were: 80.2% for morphology error, 81.6% for phonetic error, 73% for physical error, 78.3% for permutation error, 64.3% for keyboard error, 33.7% for delete error, 86% for space-issues error, and 84.5% for tachkil error.
Databáze: Supplemental Index