Prosody Hierarchy Construction for Mixed Chinese-English Spelling Speech and its Application to TTS

Autor: Tsai, Cheng-Yeh, 蔡承燁
Rok vydání: 2010
Druh dokumentu: 學位論文 ; thesis
Popis: 99
In this thesis, an unsupervised joint prosody labeling and modeling (PLM) method for mixed Chinese-English word spelling speech is proposed. It labels an unlabeled corpus with two types of prosodic tags (i.e., break type of inter-syllable juncture and prosodic state of syllable) and builds four prosodic models simultaneously. The break tags can be used to delimit prosodic constituents of a hierarchical prosody structure, and the prosodic state can be used to construct the prosodic feature patterns of prosodic constituents. The four prosodic models describe the relationships of acoustic prosodic features, prosodic tags of utterances, and the linguistic features of the associated texts. The experimental results showed that prosodic variation in English word spelling was influenced by both the prosodic state that describes underlying intonation and Chinese tone borrowing effect. Besides, the relationship between hierarchical noun phrase structure and corresponding break type was also analyzed. The analysis suggested that magnitude of the break type was highly correlated with syntactic hierarchy in a noun phrase. Lastly, we propose two prosody generation methods for mixed Chinese-English word spelling Text-to-Speech system (TTS) based on PLM. In the first method, a break predictor is constrcted by CART method. Then, the related linguistic features and the predicted break tags are used for HMM-based Text-to-Speech system (HTS) training. In the second method, PLM is directly used as a prosody generator. Experimental results confirmed that the proposed method one was superior to the conventional HTS that only use linguistic features both in objective and subjective tests. Besides, the proposed method two was significantly better than the conventional HTS method at syllable duration prediction. Therefore, we conclude that the proposed PLM method was successful in prosody labeling and modeling for constructing a mixed Chinese-English word spelling TTS.
Databáze: Networked Digital Library of Theses & Dissertations