Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
Autor: | Li, Yuang, Wu, Yu, Li, Jinyu, Liu, Shujie |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2023 |
Předmět: |
Signal Processing (eess.SP)
FOS: Computer and information sciences Computer Science - Computation and Language Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering Electrical Engineering and Systems Science - Signal Processing Computation and Language (cs.CL) Electrical Engineering and Systems Science - Audio and Speech Processing |
Popis: | The integration of Language Models (LMs) has proven to be an effective way to address domain shifts in speech recognition. However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM). LLM is used in two ways: 1) second-pass rescoring: reranking N-best hypotheses of a given ASR system with LLaMA; 2) deep LLM-fusion: incorporating LLM into the decoder of an encoder-decoder based ASR system. Experiments show that, with only one domain prompt, both methods can effectively reduce word error rates (WER) on out-of-domain TedLium-2 and SPGISpeech datasets. Especially, the deep LLM-fusion has the advantage of better recall of entity and out-of-vocabulary words. |
Databáze: | OpenAIRE |
Externí odkaz: |