Multimodal Voice Conversion Under Adverse Environment Using a Deep Convolutional Neural Network

Autor: Jian Zhou, Yuting Hu, Hailun Lian, Huabin Wang, Liang Tao, Hon Keung Kwan
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: IEEE Access, Vol 7, Pp 170878-170887 (2019)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2019.2955982
Popis: This paper presents a voice conversion (VC) technique under noisy environments. Typically, VC methods use only audio information for conversion in a noiseless environment. However, existing conversion methods do not always achieve satisfactory results in an adverse acoustic environment. To solve this problem, we propose a multimodal voice conversion model based on a deep convolutional neural network (MDCNN) built by combining two convolutional neural networks (CNN) and a deep neural network (DNN) for VC under noisy environments. In the MDCNN, both the acoustic and visual information are incorporated into the voice conversion to improve its robustness in adverse acoustic conditions. The two CNNs are designed to extract acoustic and visual features, and the DNN is designed to capture the nonlinear mapping relation of source speech and target speech. Experimental results indicate that the proposed MDCNN outperforms two existing approaches in noisy environments.
Databáze: Directory of Open Access Journals