Emotional speech synthesis with rich and granularized control
Autor: | Chung-Hyun Ahn, Sangshin Oh, Hong-Goo Kang, Inseon Jang, Se-Yun Um, Kyungguen Byun |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
Sound (cs.SD) Computer science Speech recognition 020206 networking & telecommunications Speech synthesis 02 engineering and technology 010501 environmental sciences computer.software_genre 01 natural sciences Computer Science - Sound Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering 0202 electrical engineering electronic engineering information engineering Embedding Emotional expression Control (linguistics) computer 0105 earth and related environmental sciences Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | ICASSP |
Popis: | This paper proposes an effective emotion control method for an end-to-end text-to-speech (TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is essential to determine embedding vectors representing the TTS input. We introduce an inter-to-intra emotional distance ratio algorithm to the embedding vectors that can minimize the distance to the target emotion category while maximizing its distance to the other emotion categories. To further enhance the expressiveness of a target speech, we also introduce an effective interpolation technique that enables the intensity of a target emotion to be gradually changed to that of neutral speech. Subjective evaluation results in terms of emotional expressiveness and controllability show the superiority of the proposed algorithm to the conventional methods. Submitted to ICASSP 2020 |
Databáze: | OpenAIRE |
Externí odkaz: |