GWU-HASP-2015$@$QALB-2015 Shared Task: Priming Spelling Candidates with Probability
Autor: | Mohamed Al-Badrashiny, Mona Diab, Mohammed Attia |
---|---|
Rok vydání: | 2015 |
Předmět: |
Conditional random field
Arabic Computer science business.industry Speech recognition media_common.quotation_subject computer.software_genre Punctuation language.human_language Spelling Task (project management) Classifier (linguistics) ComputingMethodologies_DOCUMENTANDTEXTPROCESSING language Artificial intelligence Language model business computer Natural language processing media_common |
Zdroj: | ANLP@ACL |
Popis: | In this paper, we describe our system HASP-2015 (Hybrid Arabic Spelling and Punctuation Corrector) in which we introduce significant improvements over our previous version HASP-2014 and with which we participated in the QALB2015 Second Shared Task on Arabic Error Correction. Our system utilizes probabilistic information on errors and their possible corrections in the training data and combine that with an open-source reference dictionary (or word list) for detecting errors and generating and filtering candidates. We enhance our system further by allowing it to generate candidates for common semantic and grammatical errors. Eventually, an n-gram language model is used for selecting best candidates. We use a CRF (Conditional Random Fields) classifier for correcting punctuation errors in a two-pass process where first the system learns punctuation placement, and then it learns to identify punctuation types. |
Databáze: | OpenAIRE |
Externí odkaz: |