Arabic Statistical N-Gram Models.

Autor: Meftouh, K., Smaili, K., Laskri, M. T.
Předmět:
Zdroj: International Review on Computers & Software; Jan2009, Vol. 4 Issue 1, p68-72, 5p, 3 Diagrams, 10 Charts
Abstrakt: In this work we propose to investigate statistical language models for Arabic. Several experiments using different smoothing techniques have been carried out on a small corpus extracted from a daily newspaper. The sparseness data conducts us to investigate other solutions without increasing the size of the corpus. A word segmentation has been operated in order to increase the statistical viability of the corpus. This leads to a better performance in terms of normalized perplexity. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index