Significance of Pseudo-syllables in building better acoustic models for Indian English TTS

Autor: S. Aswin Shanmugam, S. Rupak Vignesh, Hema A. Murthy
Rok vydání: 2016
Předmět:
Zdroj: ICASSP
Popis: Signal processing based landmark detection is precise compared to HMM based alignment, primarily because the location of the landmark is not factored in the estimation of parameters. Acoustic cues for syllable boundaries are usually obtained by exploiting the inherent sonority characteristics of a syllable. As syllabification of the text is based on generalized rules or lexicon definitions, there is a mismatch between the acoustical and the lexical segments for non-native syllabification. In this paper, an attempt is made to modify the syllabification rules for Indian English using acoustic cues obtained from syllable boundaries. The modified syllabifier is used to syllabify the text. Embedded re-estimation is performed using forced alignment at the modified syllable level to obtain refined phoneme boundaries. Indian English Text-to-Speech (TTS) systems are built using labels obtained after (i) embedded re-estimation at the sentence level and (ii) the aforementioned procedure. Reduction in the word error rates for both native Aryan and Dravidian speakers (relatively by 54.1% and 52.4% respectively), suggests that there is a significant synthesis quality improvement in the proposed system.
Databáze: OpenAIRE