Learning Multilingual Expressive Speech Representation for Prosody Prediction without Parallel Data

Autor:	Duret, Jarod, Parcollet, Titouan, Estève, Yannick
Přispěvatelé:	Laboratoire Informatique d'Avignon (LIA), Avignon Université (AU)-Centre d'Enseignement et de Recherche en Informatique - CERI, Samsung AI Center [Cambridge], University of Cambridge [UK] (CAM), European Project: 957017,SELMA
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	FOS: Computer and information sciences Sound (cs.SD) Computer Science - Computation and Language speech synthesis [STAT.ML]Statistics [stat]/Machine Learning [stat.ML] Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering prosody prediction speech generation Computation and Language (cs.CL) Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	Speech Synthesis Workshop (SSW) Speech Synthesis Workshop (SSW), Aug 2023, Grenoble, France
Popis:	International audience; We propose a method for speech-to-speech emotionpreserving translation that operates at the level of discrete speech units. Our approach relies on the use of multilingual emotion embedding that can capture affective information in a language-independent manner. We show that this embedding can be used to predict the pitch and duration of speech units in a target language, allowing us to resynthesize the source speech signal with the same emotional content. We evaluate our approach to English and French speech signals and show that it outperforms a baseline method that does not use emotional information, including when the emotion embedding is extracted from a different language. Even if this preliminary study does not address directly the machine translation issue, our results demonstrate the effectiveness of our approach for cross-lingual emotion preservation in the context of speech resynthesis.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::55f52d95e02ae2ae45f98bbab0275ff2 https://hal.science/hal-04144850/file/main.pdf Zobrazit plný text záznamu