Arabic Statistical N-gram Models

Autor: karima Meftouh, Smaili, K., Laskri, M. T.
Přispěvatelé: Université Badji Mokhtar - Annaba [Annaba] (UBMA), Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), Université Badji Mokhtar Annaba (UBMA)
Jazyk: angličtina
Rok vydání: 2009
Předmět:
Zdroj: International Review on Computers and Software (IRECOS)
International Review on Computers and Software (IRECOS), Praise Worthy Prize, 2009, 4 (1)
International Review on Computers and Software (IRECOS), 2009, 4 (1)
Scopus-Elsevier
ISSN: 1828-6003
1828-6011
Popis: International audience; In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity
Databáze: OpenAIRE