Improving Transcript-Based Video Retrieval Using Unsupervised Language Model Adaptation

Autor: Robert Herms, Thomas Wilhelm-Stein, Maximilian Eibl, Marc Ritter
Rok vydání: 2014
Předmět:
Zdroj: Lecture Notes in Computer Science ISBN: 9783319113814
CLEF
Popis: One challenge in automated speech recognition is to determine domain-specific vocabulary like names, brands, technical terms etc. by using generic language models. Especially in broadcast news new names occur frequently. We present an unsupervised method for a language model adaptation, which is used in automated speech recognition with a two-pass decoding strategy to improve spoken document retrieval on broadcast news. After keywords are extracted from each utterance, a web resource is queried to collect utterance-specific adaptation data. This data is used to augment the phonetic dictionary and adapt the basic language model. We evaluated this strategy on a data set of summarized German broadcast news using a basic retrieval setup.
Databáze: OpenAIRE