UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition

Autor:	Zhiyu Wang, Batushiren, Xiangdong Su, Xiang Hao, Hui Zhang
Rok vydání:	2020
Předmět:	Signal Processing (eess.SP) FOS: Computer and information sciences Sound (cs.SD) business.industry Computer science Speech recognition Deep learning 02 engineering and technology Intelligibility (communication) Computer Science - Sound Speech enhancement 030507 speech-language pathology & audiology 03 medical and health sciences Signal-to-noise ratio Audio and Speech Processing (eess.AS) 0202 electrical engineering electronic engineering information engineering FOS: Electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Time domain Artificial intelligence Electrical Engineering and Systems Science - Signal Processing 0305 other medical science business PESQ Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	INTERSPEECH
DOI:	10.48550/arxiv.2010.15521
Popis:	Speech enhancement at extremely low signal-to-noise ratio (SNR) condition is a very challenging problem and rarely investigated in previous works. This paper proposes a robust speech enhancement approach (UNetGAN) based on U-Net and generative adversarial learning to deal with this problem. This approach consists of a generator network and a discriminator network, which operate directly in the time domain. The generator network adopts a U-Net like structure and employs dilated convolution in the bottleneck of it. We evaluate the performance of the UNetGAN at low SNR conditions (up to -20dB) on the public benchmark. The result demonstrates that it significantly improves the speech quality and substantially outperforms the representative deep learning models, including SEGAN, cGAN fo SE, Bidirectional LSTM using phase-sensitive spectrum approximation cost function (PSA-BLSTM) and Wave-U-Net regarding Short-Time Objective Intelligibility (STOI) and Perceptual evaluation of speech quality (PESQ). Comment: Published in Interspeech 2019
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f7b1e588b1db2cfadd8fa90d7a25ff3a Zobrazit plný text záznamu