Evaluation of the Impact of Corpus Phonetic Alignment on the HMM-Based Speech Synthesis Quality

Autor: Marc Evrard, Christophe d'Alessandro, Albert Rilliard
Přispěvatelé: Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Rilliard, Albert
Jazyk: angličtina
Rok vydání: 2015
Předmět:
Zdroj: International Conference on Statistical Language and Speech Processing (SLSP 2015)
International Conference on Statistical Language and Speech Processing (SLSP 2015), 2015, Budapest, Hungary. pp.62-72
HAL
Statistical Language and Speech Processing ISBN: 9783319257884
SLSP
Popis: International audience; This study investigates the impact of phonetization and phonetic segmentation of training corpora on the quality of HMM-based TTS synthesis. HMM-TTS requires phonetic symbols aligned to the speech corpus in order to train the models used for synthesis. Phonetic annotation is a complex task, since pronunciation usually differs from spelling, as well as differing among regional accents. In this paper, the infrastructure of a French TTS system is presented. A corpus whose phonetic label occurrences were systematically modified (number of schwas and liaisons) and label boundaries were displaced, was used to train several systems, one for each condition. A perceptual evaluation of the influence of labeling accuracy on synthetic speech quality was conducted. Despite the degree of annotation changes, the synthetic speech quality of the five best systems remained close to that of the reference system, built upon the corpus whose labels were manually corrected.
Databáze: OpenAIRE