What concept-to-speech can gain for prosody

Autor:	Markus Schnell, Rüdiger Hoffmann
Rok vydání:	2004
Předmět:	Computer science media_common.quotation_subject Speech recognition Speech technology Reinforcement learning Contrast (statistics) Speech synthesis Quality (business) Prosody computer.software_genre computer Preference media_common
Zdroj:	INTERSPEECH
DOI:	10.21437/interspeech.2004-486
Popis:	This article proposes a concept-to-speech system with automated prosody learning based on reinforcement learning. The concept-to-speech system, named Demosthenes, is an extension of the text-to-speech system DreSS. Demosthenes is responsible for template-based text generation and symbolic prosody prediction, while DreSS takes care of acoustic prosody and speech synthesis. The prosody predictor is an application of reinforcement learning, using content, given and new, contrast, and number of words since last accented words as indicators in state space. The system is trained with a simple rule, giving reward according to prediction performance on a small sample text. For an impression of the gain in prosodic quality, we compare the concept-to-speech system to an existing text-to-speech system. The results indicate a clear preference for the concept-to-speech system.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::a3d71023ea2ca772172ba1132ee9e7cf https://doi.org/10.21437/interspeech.2004-486 Zobrazit plný text záznamu