Discriminatively trained Language Models using Support Vector Machines for Language Identification

Autor: Herbert Gish, Man-Hung Siu, Lu-Feng Zhai, Xi Yang
Rok vydání: 2006
Předmět:
Zdroj: Odyssey
Popis: In this paper, we explore the use of the Support Vector Machines (SVMs) to learn a discriminatively trained n-gram model for automatic language identification. Our focus is on practical considerations that make SVM technology more effective. We address the performance related issues of class priors, data imbalance, feature weighting, score normalization and combining multiple knowledge sources with SVMs. Using modified n-gram counts as features, we show that the SVM-trained n-grams are effective classifiers but they are sensitive to changes in prior class distributions. Using balanced prior distributions or score normalization procedures, the SVM-trained n-gram outperformed the traditional n-gram in parallel phoneme recognition with language model and GMM-UBM-based language identification systems by more than 30% relative error reduction on the OGI-TS corpus.
Databáze: OpenAIRE