Arabic Statistical N-gram Models
Autor: | karima Meftouh, Smaili, K., Laskri, M. T. |
---|---|
Přispěvatelé: | Université Badji Mokhtar - Annaba [Annaba] (UBMA), Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), Université Badji Mokhtar Annaba (UBMA) |
Jazyk: | angličtina |
Rok vydání: | 2009 |
Předmět: | |
Zdroj: | International Review on Computers and Software (IRECOS) International Review on Computers and Software (IRECOS), Praise Worthy Prize, 2009, 4 (1) International Review on Computers and Software (IRECOS), 2009, 4 (1) Scopus-Elsevier |
ISSN: | 1828-6003 1828-6011 |
Popis: | International audience; In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity |
Databáze: | OpenAIRE |
Externí odkaz: |