Improving Transcript-Based Video Retrieval Using Unsupervised Language Model Adaptation
Autor: | Robert Herms, Thomas Wilhelm-Stein, Maximilian Eibl, Marc Ritter |
---|---|
Rok vydání: | 2014 |
Předmět: |
Vocabulary
Information retrieval Computer science business.industry media_common.quotation_subject InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL computer.software_genre language.human_language German language Data set (IBM mainframe) Language model Artificial intelligence Web resource Document retrieval Adaptation (computer science) business computer Natural language processing Utterance media_common |
Zdroj: | Lecture Notes in Computer Science ISBN: 9783319113814 CLEF |
Popis: | One challenge in automated speech recognition is to determine domain-specific vocabulary like names, brands, technical terms etc. by using generic language models. Especially in broadcast news new names occur frequently. We present an unsupervised method for a language model adaptation, which is used in automated speech recognition with a two-pass decoding strategy to improve spoken document retrieval on broadcast news. After keywords are extracted from each utterance, a web resource is queried to collect utterance-specific adaptation data. This data is used to augment the phonetic dictionary and adapt the basic language model. We evaluated this strategy on a data set of summarized German broadcast news using a basic retrieval setup. |
Databáze: | OpenAIRE |
Externí odkaz: |