Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech

Autor:	Gustav Eje Henter, Catherine Mayo, Simon King, Matt Shannon, Thomas Merritt
Rok vydání:	2014
Předmět:	Computer science Speech recognition Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing) 020206 networking & telecommunications Speech synthesis Speech corpus 02 engineering and technology Filter (signal processing) Covariance computer.software_genre 01 natural sciences Computer Science::Sound 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Active listening 010301 acoustics computer Independence (probability theory) Parametric statistics
Zdroj:	INTERSPEECH Henter, G E, Merritt, T, Shannon, M, Mayo, C & King, S 2014, Measuring the Perceptual Effects of Modelling Assumptions in Speech Synthesis Using Stimuli Constructed from Repeated Natural Speech . in INTERSPEECH 2014 15th Annual Conference of the International Speech Communication Association . pp. 1504-1508 . < http://www.isca-speech.org/archive/interspeech_2014/i14_1504.html > Scopus-Elsevier
DOI:	10.21437/interspeech.2014-361
Popis:	Acoustic models used for statistical parametric speech synthesis typically incorporate many modelling assumptions. It is an open question to what extent these assumptions limit the naturalness of synthesised speech. To investigate this question, we recorded a speech corpus where each prompt was read aloud multiple times. By combining speech parameter trajectories extracted from different repetitions, we were able to quantify the perceptual effects of certain commonly used modelling assumptions. Subjective listening tests show that taking the source and filter parameters to be conditionally independent, or using diagonal covariance matrices, significantly limits the naturalness that can be achieved. Our experimental results also demonstrate the shortcomings of mean-based parameter generation. Index terms: speech synthesis, acoustic modelling, stream independence, diagonal covariance matrices, repeated speech
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3ed9bcafd62b4ecdd4a6eb42f96716b4 https://doi.org/10.21437/interspeech.2014-361 Zobrazit plný text záznamu