Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection
Autor: | Jaime Lorenzo-Trueba, Jonas Rohnke, Thomas Drugman, Shubhi Tyagi, Marco Nicolis |
---|---|
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
Sound (cs.SD) Computer science Speech recognition media_common.quotation_subject Speech synthesis 010501 environmental sciences computer.software_genre 01 natural sciences Computer Science - Sound 050105 experimental psychology Rule-based machine translation Audio and Speech Processing (eess.AS) Reading (process) Selection (linguistics) FOS: Electrical engineering electronic engineering information engineering 0501 psychology and cognitive sciences Prosody 0105 earth and related environmental sciences media_common Computer Science - Computation and Language 05 social sciences Intonation (linguistics) Embedding computer Computation and Language (cs.CL) Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | INTERSPEECH |
DOI: | 10.48550/arxiv.1912.00955 |
Popis: | Recent advances in Text-to-Speech (TTS) have improved quality and naturalness to near-human capabilities when considering isolated sentences. But something which is still lacking in order to achieve human-like communication is the dynamic variations and adaptability of human speech. This work attempts to solve the problem of achieving a more dynamic and natural intonation in TTS systems, particularly for stylistic speech such as the newscaster speaking style. We propose a novel embedding selection approach which exploits linguistic information, leveraging the speech variability present in the training dataset. We analyze the contribution of both semantic and syntactic features. Our results show that the approach improves the prosody and naturalness for complex utterances as well as in Long Form Reading (LFR). |
Databáze: | OpenAIRE |
Externí odkaz: |