Speaker adaptation of speaking rate-dependent hierarchical prosodic model for Mandarin TTS
Autor: | Sin-Horng Chen, Yih-Ru Wang, Po-Chun Wang, Chen-Yu Chiang, I-Bin Liao |
---|---|
Rok vydání: | 2014 |
Předmět: |
business.industry
Computer science Speech recognition Extrapolation computer.software_genre Speaker recognition Mandarin Chinese language.human_language Speaker diarisation Range (mathematics) language Artificial intelligence Adaptation (computer science) Prosody business computer Natural language processing Speaker adaptation |
Zdroj: | ISCSLP |
DOI: | 10.1109/iscslp.2014.6936616 |
Popis: | In this paper, a speaker adaptation method to adapt an existing speaking rate-dependent hierarchical prosodic model (SR-HPM) of an SR-controlled Mandarin TTS system to new speaker's data for realizing a new voice is proposed. Two main problems are addressed: data sparseness for few adaptation utterances existing only in a small range of normal speaking rate and no adaptation data in both ranges of fast and slow speaking rates. The proposed method follows the idea of SR-HPM training to firstly normalize the prosodic-acoustic features of the new speaker's speech data, to then train an HPM by the prosody labeling and modeling algorithm, and to lastly refine the HPM to an SR-dependent model. The MAP adaptation method with model parameter extrapolation is applied to cope with the above two problems. Experimental results on a male speaker's adaptation data confirmed that the resulting adaptive SR-HPM has reasonable parameters covering a wide range of speaking rates and hence can be used in the TTS system to generate prosodic-acoustic features for synthesizing the new speaker's voice of any given SR. |
Databáze: | OpenAIRE |
Externí odkaz: |