Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Autor:	Li, Yuang, Wu, Yu, Li, Jinyu, Liu, Shujie
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Signal Processing (eess.SP) FOS: Computer and information sciences Computer Science - Computation and Language Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering Electrical Engineering and Systems Science - Signal Processing Computation and Language (cs.CL) Electrical Engineering and Systems Science - Audio and Speech Processing
Popis:	The integration of Language Models (LMs) has proven to be an effective way to address domain shifts in speech recognition. However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM). LLM is used in two ways: 1) second-pass rescoring: reranking N-best hypotheses of a given ASR system with LLaMA; 2) deep LLM-fusion: incorporating LLM into the decoder of an encoder-decoder based ASR system. Experiments show that, with only one domain prompt, both methods can effectively reduce word error rates (WER) on out-of-domain TedLium-2 and SPGISpeech datasets. Especially, the deep LLM-fusion has the advantage of better recall of entity and out-of-vocabulary words.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::38090c0a55d370bec08810bda39f5985 http://arxiv.org/abs/2306.16007 Zobrazit plný text záznamu