ANALYSIS AND MODELLING OF TEMPORAL CHARACTERISTICS OF SPEECH FOR ESTONIAN TEXT-TO-SPEECH SYNTHESIS.

Autor: Mihkla, Meelis, Kuusik, Jüri
Předmět:
Zdroj: Linguistica Uralica; 2005, Vol. 41 Issue 2, p91-97, 7p
Abstrakt: A text-to-speech system must be capable of generating sounds and pauses with such durations that do not noticeably differ from natural speech. Currently, the prosodic modelling of Estonian text-to-speech synthesis is largely based on generalized measurements of speech units in isolated words and sentences, and as a result the synthesized speech is often monotonous and has poor fluency. In this work the first attempts are made to improve the naturalness of the output speech of the speech synthesiser with the help of statistical duration models of fluent speech. The source material consisted of (a) prose read out by a professional actor, and (b) news broadcasts read by announcers. On the basis of this material variability of the duration of pauses and boundary lengthenings was investigated. It turns out that in the case of a read text at normal speech rate the classification of speech pauses is perfectly possible and can be applied in speech synthesis. An attempt was also made to establish whether and to what extent the syntactic parsing of a text is related to the prosodic parsing of speech. A generalized regression analysis revealed what features are essential in predicting sound durations in speech and a statistically optimal model was developed. Curiously the quantity degree of a foot, despite being the cornerstone of Estonian word prosody, was not a significant feature for prediciting the duration of a sound on the basis of this material. The results of the modelling were then compared with the expert opinions of some Estonian phoneticians. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index