Multiregression analysis of autoregressive with exogenous input speech synthesis parameters and voice qualities

Autor: Hiroshi Kido, Masayoshi Kawamata, Hideki Kasuya
Rok vydání: 2004
Předmět:
Zdroj: The Journal of the Acoustical Society of America. 116:2545-2545
ISSN: 0001-4966
DOI: 10.1121/1.4785154
Popis: This study investigates the relationship between acoustic parameters utilized in the formant‐based ARX (autoregressive with exogenous input) speech synthesis model (J. Acoust. Soc. Jpn., 58, 386–397) and perceived voice qualities of synthetic speech. The acoustic parameters manipulated were F0 baseline, F0 range, spectral tilt of glottal flow (TL), formant scaling parameter (FS), and speaking rate (SR). Japanese expressions associated with voice qualities were high‐pitched/low‐pitched, masculine/feminine, hoarse/clear, calm/excited, powerful/weak, youthful/elderly, thick/thin, and tense/lax (Proc. ICSLP‐98, No. 1005). A sentence utterance of an average speaker selected from a database of 109 male speakers was analyzed using the ARX method. Each of the five acoustic parameters of the utterance was manipulated at three levels, producing 243 samples of synthetic speech (3×3×3×3×3). Ten subjects evaluated the voice qualities of each of the 243 synthetic stimuli with regard to the eight Japanese expressions. Multiregression analysis showed that F0 range, F0 baseline, and FS were primary acoustic correlates of high‐pitched/low‐pitched and masculine/feminine, SR and F0 range for calm/excited, and F0 range, SR and F0 baseline for thick/thin. Significant relations were not found for the remainder of the Japanese expressions, which was thought to be associated in part with irregularities of glottal flow.
Databáze: OpenAIRE