Online LDA-Based Language Model Adaptation

Autor:	Aleš Pražák, Jan Lehečka
Rok vydání:	2018
Předmět:	Topic model Text corpus Perplexity Computer science business.industry 020206 networking & telecommunications 02 engineering and technology computer.software_genre Latent Dirichlet allocation Task (project management) Reduction (complexity) symbols.namesake ComputingMethodologies_PATTERNRECOGNITION 0202 electrical engineering electronic engineering information engineering symbols 020201 artificial intelligence & image processing Artificial intelligence Language model Adaptation (computer science) business computer Natural language processing
Zdroj:	Text, Speech, and Dialogue ISBN: 9783030007935 TSD
DOI:	10.1007/978-3-030-00794-2_36
Popis:	In this paper, we present our improvements in online topic-based language model adaptation. Our aim is to enhance the automatic speech recognition of a multi-topic speech which is to be recognized in the real-time (online). Latent Dirichlet Allocation (LDA) is an unsupervised topic model designed to uncover hidden semantic relationships between words and documents in a text corpus and thus reveal latent topics automatically. We use LDA to cluster the text corpus and to predict topics online from partial hypotheses during the real-time speech recognition. Based on detected topic changes in the speech, we adapt the language model on-the-fly. We are demonstrating the improvement of our system on the task of online subtitling of TV news, where we achieved \(18\%\) relative reduction of perplexity and \(3.52\%\) relative reduction of WER over non-adapted system.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::17a1dd5ba12f713d4c22612dc9f18122 https://doi.org/10.1007/978-3-030-00794-2_36 Zobrazit plný text záznamu