Significance of Pseudo-syllables in building better acoustic models for Indian English TTS
Autor: | S. Aswin Shanmugam, S. Rupak Vignesh, Hema A. Murthy |
---|---|
Rok vydání: | 2016 |
Předmět: |
Indian English
business.industry Computer science Syllabification Speech recognition 05 social sciences Lexicon computer.software_genre 050105 experimental psychology language.human_language 030507 speech-language pathology & audiology 03 medical and health sciences Sonority hierarchy language 0501 psychology and cognitive sciences Artificial intelligence Syllable 0305 other medical science Hidden Markov model business computer Natural language processing Sentence |
Zdroj: | ICASSP |
Popis: | Signal processing based landmark detection is precise compared to HMM based alignment, primarily because the location of the landmark is not factored in the estimation of parameters. Acoustic cues for syllable boundaries are usually obtained by exploiting the inherent sonority characteristics of a syllable. As syllabification of the text is based on generalized rules or lexicon definitions, there is a mismatch between the acoustical and the lexical segments for non-native syllabification. In this paper, an attempt is made to modify the syllabification rules for Indian English using acoustic cues obtained from syllable boundaries. The modified syllabifier is used to syllabify the text. Embedded re-estimation is performed using forced alignment at the modified syllable level to obtain refined phoneme boundaries. Indian English Text-to-Speech (TTS) systems are built using labels obtained after (i) embedded re-estimation at the sentence level and (ii) the aforementioned procedure. Reduction in the word error rates for both native Aryan and Dravidian speakers (relatively by 54.1% and 52.4% respectively), suggests that there is a significant synthesis quality improvement in the proposed system. |
Databáze: | OpenAIRE |
Externí odkaz: |