Evaluation of the Impact of Corpus Phonetic Alignment on the HMM-Based Speech Synthesis Quality
Autor: | Marc Evrard, Christophe d'Alessandro, Albert Rilliard |
---|---|
Přispěvatelé: | Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Rilliard, Albert |
Jazyk: | angličtina |
Rok vydání: | 2015 |
Předmět: |
Subjective evaluation
Computer science Phonetic labeling Speech recognition Speech synthesis 02 engineering and technology Pronunciation computer.software_genre 01 natural sciences TTS Annotation Phonetic search technology 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Segmentation Phonetic alignment [SHS.LANGUE]Humanities and Social Sciences/Linguistics Hidden Markov model 010301 acoustics business.industry HMM-based speech synthesis 020206 networking & telecommunications Speech corpus MOS [SHS.LANGUE] Humanities and Social Sciences/Linguistics French speech synthesis Spelling Artificial intelligence HTS business computer Natural language processing |
Zdroj: | International Conference on Statistical Language and Speech Processing (SLSP 2015) International Conference on Statistical Language and Speech Processing (SLSP 2015), 2015, Budapest, Hungary. pp.62-72 HAL Statistical Language and Speech Processing ISBN: 9783319257884 SLSP |
Popis: | International audience; This study investigates the impact of phonetization and phonetic segmentation of training corpora on the quality of HMM-based TTS synthesis. HMM-TTS requires phonetic symbols aligned to the speech corpus in order to train the models used for synthesis. Phonetic annotation is a complex task, since pronunciation usually differs from spelling, as well as differing among regional accents. In this paper, the infrastructure of a French TTS system is presented. A corpus whose phonetic label occurrences were systematically modified (number of schwas and liaisons) and label boundaries were displaced, was used to train several systems, one for each condition. A perceptual evaluation of the influence of labeling accuracy on synthetic speech quality was conducted. Despite the degree of annotation changes, the synthetic speech quality of the five best systems remained close to that of the reference system, built upon the corpus whose labels were manually corrected. |
Databáze: | OpenAIRE |
Externí odkaz: |