Multimodal Voice Conversion Under Adverse Environment Using a Deep Convolutional Neural Network

Autor:	Jian Zhou, Yuting Hu, Hailun Lian, Huabin Wang, Liang Tao, Hon Keung Kwan
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	Audio and video feature fusion convolutional neural network deep learning mel-frequency cepstral coefficients multilayer feedforward neural networks multimodal voice conversion Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 7, Pp 170878-170887 (2019)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2019.2955982
Popis:	This paper presents a voice conversion (VC) technique under noisy environments. Typically, VC methods use only audio information for conversion in a noiseless environment. However, existing conversion methods do not always achieve satisfactory results in an adverse acoustic environment. To solve this problem, we propose a multimodal voice conversion model based on a deep convolutional neural network (MDCNN) built by combining two convolutional neural networks (CNN) and a deep neural network (DNN) for VC under noisy environments. In the MDCNN, both the acoustic and visual information are incorporated into the voice conversion to improve its robustness in adverse acoustic conditions. The two CNNs are designed to extract acoustic and visual features, and the DNN is designed to capture the nonlinear mapping relation of source speech and target speech. Experimental results indicate that the proposed MDCNN outperforms two existing approaches in noisy environments.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/3ff1a09465cd49f0bc4157af2608eea0 Zobrazit plný text záznamu View record in DOAJ