A Vietnamese language model based on Recurrent Neural Network

Autor:	Kiem-Hieu Nguyen, Viet-Trung Tran, Duc-Hanh Bui
Rok vydání:	2016
Předmět:	Context model Perplexity Machine translation Language identification Computer science business.industry 020209 energy Speech recognition 02 engineering and technology computer.software_genre ComputingMethodologies_PATTERNRECOGNITION Recurrent neural network Cache language model 0202 electrical engineering electronic engineering information engineering Subtitle 020201 artificial intelligence & image processing Language model Artificial intelligence business computer Natural language processing
Zdroj:	KSE
Popis:	Language modeling plays a critical role in many natural language processing (NLP) tasks such as text prediction, machine translation and speech recognition. Traditional statistical language models (e.g. n-gram models) can only offer words that have been seen before and can not capture long word context. Neural language model provides a promising solution to surpass this shortcoming of statistical language model. This paper investigates Recurrent Neural Networks (RNNs) language model for Vietnamese, at character and syllable-levels. Experiments were conducted on a large dataset of 24M syllables, constructed from 1,500 movie subtitles. The experimental results show that our RNN-based language models yield reasonable performance on the movie subtitle dataset. Concretely, our models outperform n-gram language models in term of perplexity score.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::80889359ec983028745ff86897d406f2 https://doi.org/10.1109/kse.2016.7758066 Zobrazit plný text záznamu