A flexible front-end for HTS
Autor: | Thomas Merritt, Matthew P. Aylett, Arnab Ghoshal, Rasmus Dall, Gustav Eje Henter |
---|---|
Rok vydání: | 2014 |
Předmět: | |
Zdroj: | INTERSPEECH Aylett, M, Dall, R, Ghoshal, A, Henter, G E & Merritt, T 2014, A Flexible Front-End for HTS . in INTERSPEECH 2014 15th Annual Conference of the International Speech Communication Association . pp. 1283-1287 . < http://www.isca-speech.org/archive/interspeech_2014/i14_1283.html > |
DOI: | 10.21437/interspeech.2014-320 |
Popis: | Parametric speech synthesis techniques depend on full context acoustic models generated by language front-ends, which analyse linguistic and phonetic structure. HTS, the leading parametric synthesis system, can use a number of different front-ends to generate full context models for synthesis and training. In this paper we explore the use of a new text processing front-end that has been added to the speech recognition toolkit Kaldi as part of an ongoing project to produce a new parametric speech synthesis system, Idlak. The use of XML specification files, a modular design, and modern coding and testing approaches, make the Idlak front-end ideal for adding, altering and experimenting with the contexts used in full context acoustic models. The Idlak front-end was evaluated against the standard Festival front-end in the HTS system. Results from the Idlak front-end compare well with the more mature Festival front-end (Idlak 2.83 MOS vs Festival - 2.85 MOS), although a slight reduction in naturalness perceived by non-native English speakers can be attributed to Festival’s insertion of non-punctuated pauses. Index Terms: speech synthesis, text processing, parametric synthesis, Kaldi, Idlak |
Databáze: | OpenAIRE |
Externí odkaz: |