Duration modeling using DNN for Arabic speech synthesis

Autor:	Denis Jouvet, Amal Houidhek, Zied Mnasri, Imene Zangar, Vincent Colotte
Přispěvatelé:	Ecole Nationale d'Ingénieurs de Tunis (ENIT), Université de Tunis El Manar (UTM), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Jouvet, Denis
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	Consonant [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing Computer science Speech recognition 020206 networking & telecommunications Speech synthesis 02 engineering and technology computer.software_genre Task (project management) MERLIN phoneme duration modeling [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing Gemination Duration (music) Vowel 0202 electrical engineering electronic engineering information engineering Arabic TTS 020201 artificial intelligence & image processing HTS Set (psychology) computer Parametric statistics DNN
Zdroj:	9th International Conference on Speech Prosody 9th International Conference on Speech Prosody, Jun 2018, Poznań, Poland
Popis:	International audience; Duration modeling is a key task for every parametric speech synthesis system. Though such parametric systems have been adapted to many languages, no special attention was paid to explicitly handling Arabic speech characteristics. Actually, in Arabic phoneme duration has a distinctive role, because of consonant gemination and vowel quantity. Therefore, a precise modeling of sound durations is critical. In this paper we compare several modeling of phoneme durations (including duration modeling by HTS and MERLIN toolkits), and we propose a new approach which relies on using a set of models, each one being optimal for a given phoneme class (e.g., simple consonants, geminated consonants, short vowels, and long vowels). An objective evaluation carried out on a set of test sentences shows that the proposed approach leads to a more accurate modeling of the phoneme durations.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ebfbda24a4a53b83db8a41e285f2ce56 https://hal.inria.fr/hal-01889917/document Zobrazit plný text záznamu