Arabic Statistical N-gram Models

Autor:	karima Meftouh, Smaili, K., Laskri, M. T.
Přispěvatelé:	Université Badji Mokhtar - Annaba [Annaba] (UBMA), Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), Université Badji Mokhtar Annaba (UBMA)
Jazyk:	angličtina
Rok vydání:	2009
Předmět:	Statistical language model Arabic word-based n-gram models N-gram models Perplexity Morpheme-based n-gram models [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
Zdroj:	International Review on Computers and Software (IRECOS) International Review on Computers and Software (IRECOS), Praise Worthy Prize, 2009, 4 (1) International Review on Computers and Software (IRECOS), 2009, 4 (1) Scopus-Elsevier
ISSN:	1828-6003 1828-6011
Popis:	International audience; In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::852421efd75da68b73fbd00db7c3e264 https://hal.inria.fr/hal-01639807 Zobrazit plný text záznamu