Language identification for internet security in the basque context: A cross-lingual approach.

Autor: Barroso, Nora, de Ipina, Karmele Lopez, Ezeiza, Aitzol, Hernandez, Carmen
Zdroj: IEEE Aerospace & Electronic Systems; Aug2013 Part 1, Vol. 28 Issue 8 Part 1, p24-31, 8p
Abstrakt: The present work describes the development of an LID system suited for handling security tasks in the Internet. The development context was the Infozazpi Internet digital radio, and the task presented substantial complexity due to the trilingual environment and the scarcity of language resources for Basque. In order to overcome previous difficulties, we propose a hybrid system based on the selection of subword units by SVMs, MLP classifiers, and discriminant analysis improved with robust regularized covariance matrix estimation methods and stochastic methods for ASR tasks (SC-HMM and n-grams). Our new subword unit proposals and the use of triphones and cross-lingual approaches considerably improve the system performance, achieving an optimal and stable LID recognition rate despite the complexity of the problem. [ABSTRACT FROM PUBLISHER]
Databáze: Complementary Index