GWU-HASP-2015$@$QALB-2015 Shared Task: Priming Spelling Candidates with Probability

Autor: Mohamed Al-Badrashiny, Mona Diab, Mohammed Attia
Rok vydání: 2015
Předmět:
Zdroj: ANLP@ACL
Popis: In this paper, we describe our system HASP-2015 (Hybrid Arabic Spelling and Punctuation Corrector) in which we introduce significant improvements over our previous version HASP-2014 and with which we participated in the QALB2015 Second Shared Task on Arabic Error Correction. Our system utilizes probabilistic information on errors and their possible corrections in the training data and combine that with an open-source reference dictionary (or word list) for detecting errors and generating and filtering candidates. We enhance our system further by allowing it to generate candidates for common semantic and grammatical errors. Eventually, an n-gram language model is used for selecting best candidates. We use a CRF (Conditional Random Fields) classifier for correcting punctuation errors in a two-pass process where first the system learns punctuation placement, and then it learns to identify punctuation types.
Databáze: OpenAIRE