Comparative study of Arabic and french statistical language models

Autor:	karima Meftouh, Smaili, K., Laskri, M. T.
Přispěvatelé:	Laboratoire de Recherche en Informatique (LRI-ANNABA), Université Badji Mokhtar - Annaba [Annaba] (UBMA), Analysis, perception and recognition of speech (PAROLE), INRIA Lorraine, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS)-Université Henri Poincaré - Nancy 1 (UHP)-Université Nancy 2-Institut National Polytechnique de Lorraine (INPL)-Centre National de la Recherche Scientifique (CNRS), INSTICC, Université Badji Mokhtar Annaba (UBMA)
Jazyk:	angličtina
Rok vydání:	2009
Předmět:	n-gram model Arabic French perplexity Statistical language modeling smoothing technique performance [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] vocabulary
Zdroj:	ICAART'09-International Conference On agents and Artificial Intelligence ICAART'09-International Conference On agents and Artificial Intelligence, INSTICC, Jan 2009, Porto, Portugal Scopus-Elsevier
Popis:	International audience; In this paper, we propose a comparative study of statistical language models of Arabic and French. The objective of this study is to understand how to better model both Arabic and French. Several experiments using different smoothing techniques have been carried out. For French, trigram models are most appropriate whatever the smoothing technique used. For Arabic, the n-gram models of higher order smoothed with Witten Bell method are more efficient. Tests are achieved with comparable corpora and vocabularies in terms of size
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::69cb0b623fdac43049bb2717695c9b11 https://hal.inria.fr/inria-00352927/file/ICAART.pdf Zobrazit plný text záznamu