Monolingual Recognizers Fusion for Code-switching Speech Recognition

Autor:	Song, Tongtong, Xu, Qiang, Lu, Haoyu, Wang, Longbiao, Shi, Hao, Lin, Yuqin, Yang, Yanbing, Dang, Jianwu
Rok vydání:	2022
Předmět:	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Computation and Language Computer Science - Sound
Druh dokumentu:	Working Paper
Popis:	The bi-encoder structure has been intensively investigated in code-switching (CS) automatic speech recognition (ASR). However, most existing methods require the structures of two monolingual ASR models (MAMs) should be the same and only use the encoder of MAMs. This leads to the problem that pre-trained MAMs cannot be timely and fully used for CS ASR. In this paper, we propose a monolingual recognizers fusion method for CS ASR. It has two stages: the speech awareness (SA) stage and the language fusion (LF) stage. In the SA stage, acoustic features are mapped to two language-specific predictions by two independent MAMs. To keep the MAMs focused on their own language, we further extend the language-aware training strategy for the MAMs. In the LF stage, the BELM fuses two language-specific predictions to get the final prediction. Moreover, we propose a text simulation strategy to simplify the training process of the BELM and reduce reliance on CS data. Experiments on a Mandarin-English corpus show the efficiency of the proposed method. The mix error rate is significantly reduced on the test set after using open-source pre-trained MAMs. Comment: Submitted to ICASSP2023
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2211.01046 Zobrazit plný text záznamu View this record from Arxiv