Noise-robust Speech Recognition with 10 Minutes Unparalleled In-domain Data

Autor:	Chen, Chen, Hou, Nana, Hu, Yuchen, Shirol, Shashank, Chng, Eng Siong
Rok vydání:	2022
Předmět:	Computer Science - Sound Computer Science - Computation and Language Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	Noise-robust speech recognition systems require large amounts of training data including noisy speech data and corresponding transcripts to achieve state-of-the-art performances in face of various practical environments. However, such plenty of in-domain data is not always available in the real-life world. In this paper, we propose a generative adversarial network to simulate noisy spectrum from the clean spectrum (Simu-GAN), where only 10 minutes of unparalleled in-domain noisy speech data is required as labels. Furthermore, we also propose a dual-path speech recognition system to improve the robustness of the system under noisy conditions. Experimental results show that the proposed speech recognition system achieves 7.3% absolute improvement with simulated noisy data by Simu-GAN over the best baseline in terms of word error rate (WER). Comment: Accepted by ICASSP2022
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2203.15321 Zobrazit plný text záznamu View this record from Arxiv