Discriminatively trained Language Models using Support Vector Machines for Language Identification
Autor: | Herbert Gish, Man-Hung Siu, Lu-Feng Zhai, Xi Yang |
---|---|
Rok vydání: | 2006 |
Předmět: |
Normalization (statistics)
Language identification Computer science business.industry Speech recognition Mixture model Machine learning computer.software_genre Weighting Support vector machine ComputingMethodologies_PATTERNRECOGNITION Prior probability Language model Artificial intelligence business computer Natural language |
Zdroj: | Odyssey |
Popis: | In this paper, we explore the use of the Support Vector Machines (SVMs) to learn a discriminatively trained n-gram model for automatic language identification. Our focus is on practical considerations that make SVM technology more effective. We address the performance related issues of class priors, data imbalance, feature weighting, score normalization and combining multiple knowledge sources with SVMs. Using modified n-gram counts as features, we show that the SVM-trained n-grams are effective classifiers but they are sensitive to changes in prior class distributions. Using balanced prior distributions or score normalization procedures, the SVM-trained n-gram outperformed the traditional n-gram in parallel phoneme recognition with language model and GMM-UBM-based language identification systems by more than 30% relative error reduction on the OGI-TS corpus. |
Databáze: | OpenAIRE |
Externí odkaz: |