Discriminatively trained Language Models using Support Vector Machines for Language Identification

Autor:	Herbert Gish, Man-Hung Siu, Lu-Feng Zhai, Xi Yang
Rok vydání:	2006
Předmět:	Normalization (statistics) Language identification Computer science business.industry Speech recognition Mixture model Machine learning computer.software_genre Weighting Support vector machine ComputingMethodologies_PATTERNRECOGNITION Prior probability Language model Artificial intelligence business computer Natural language
Zdroj:	Odyssey
Popis:	In this paper, we explore the use of the Support Vector Machines (SVMs) to learn a discriminatively trained n-gram model for automatic language identification. Our focus is on practical considerations that make SVM technology more effective. We address the performance related issues of class priors, data imbalance, feature weighting, score normalization and combining multiple knowledge sources with SVMs. Using modified n-gram counts as features, we show that the SVM-trained n-grams are effective classifiers but they are sensitive to changes in prior class distributions. Using balanced prior distributions or score normalization procedures, the SVM-trained n-gram outperformed the traditional n-gram in parallel phoneme recognition with language model and GMM-UBM-based language identification systems by more than 30% relative error reduction on the OGI-TS corpus.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::1124959630324ac3a7f67b77a0584232 https://doi.org/10.1109/odyssey.2006.248098 Zobrazit plný text záznamu