Automatic Stochastic Arabic Spelling Correction With Emphasis on Space Insertions and Deletions
Autor: | Mohamed Alkanhal, Mohamed Al-Badrashiny, Mansour M. Alghamdi, Abdulaziz O. Al-Qabbany |
---|---|
Rok vydání: | 2012 |
Předmět: |
Space (punctuation)
Acoustics and Ultrasonics Computer science Character (computing) Stochastic process business.industry Speech recognition Context (language use) computer.software_genre Edit distance Artificial intelligence Electrical and Electronic Engineering Marginal distribution business F1 score computer Word (computer architecture) Natural language processing |
Zdroj: | IEEE Transactions on Audio, Speech, and Language Processing. 20:2111-2122 |
ISSN: | 1558-7924 1558-7916 |
DOI: | 10.1109/tasl.2012.2197612 |
Popis: | This paper presents a stochastic-based approach for misspelling correction of Arabic text. In this approach, a context-based two-layer system is utilized to automatically correct misspelled words in large datasets. The first layer produces a list in which possible alternatives for each misspelled word are ranked using the Damerau-Levenshtein edit distance. The same layer also considers merged and split words resulting from deletion and insertion of space character. The right alternative for each misspelled word is stochastically selected based on the maximum marginal probability via A* lattice search and m-gram probability estimation. A large dataset was utilized to build and test the system. The testing results show that as we increase the size of the training set, the performance improves reaching 97.9% of F1 score for detection and 92.3% of F1 score for correction. |
Databáze: | OpenAIRE |
Externí odkaz: |