Modeling-By-Generation-Structured Noise Compensation Algorithm for Glottal Vocoding Speech Synthesis System

Autor:	Min-Jae Hwang, Hong-Goo Kang, Kyungguen Byun, Eunwoo Song
Rok vydání:	2018
Předmět:	Excitation signal Noise measurement Computer science business.industry Deep learning Process (computing) 020206 networking & telecommunications Speech synthesis 02 engineering and technology computer.software_genre 01 natural sciences Signal Harmonic analysis Noise Computer Science::Sound 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Harmonic Artificial intelligence business 010301 acoustics Algorithm computer
Zdroj:	ICASSP
DOI:	10.1109/icassp.2018.8461606
Popis:	This paper proposes a novel noise compensation algorithm for a glottal excitation model in a deep learning (DL)-based speech synthesis system. To generate high-quality speech synthesis outputs, the balance between harmonic and noise components of the glottal excitation signal should be well-represented by the DL network. However, it is hard to accurately model the noise component because the DL training process inevitably results in statistically smoothed outputs; thus, it is essential to introduce an additional noise compensation process. We propose a modeling-by-generation structure-based noise compensation method that the missing noise component in the generated glottal signal is directly extracted and parameterized during the entire training process. By modeling the noise component using the additional DL network, the proposed system successfully compensates the missing noise component. Objective and subjective test results confirm that the synthesized speech with the proposed noise compensation method is superior to that with conventional methods.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::431bb68b2072ac4e3435a671243bbf94 https://doi.org/10.1109/icassp.2018.8461606 Zobrazit plný text záznamu