A Dual Stream Generative Adversarial Network with Phase Awareness for Speech Enhancement

Autor:	Xintao Liang, Yuhang Li, Xiaomin Li, Yue Zhang, Youdong Ding
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	speech enhancement GAN transformer phase spectrogram dual stream Information technology T58.5-58.64
Zdroj:	Information, Vol 14, Iss 4, p 221 (2023)
Druh dokumentu:	article
ISSN:	2078-2489
DOI:	10.3390/info14040221
Popis:	Implementing single-channel speech enhancement under unknown noise conditions is a challenging problem. Most existing time-frequency domain methods are based on the amplitude spectrogram, and these methods often ignore the phase mismatch between noisy speech and clean speech, which largely limits the performance of speech enhancement. To solve the phase mismatch problem and further improve enhancement performance, this paper proposes a dual-stream Generative Adversarial Network (GAN) with phase awareness, named DPGAN. Our generator uses a dual-stream structure to predict amplitude and phase separately and adds an information communication module between the two streams to fully apply the phase information. To make the prediction more efficient, we apply Transformer to build the generator, which can learn the sound’s structural properties more easily. Finally, we designed a perceptually guided discriminator that quantitatively evaluates the quality of speech, optimising the generator for specific evaluation metrics. We conducted experiments on the most widely used Voicebank-DEMAND dataset and DPGAN achieved state-of-the-art on most metrics.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/f61f5ea134e1442dbb81420dd050ab14 Zobrazit plný text záznamu View record in DOAJ Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.