Popis: |
This study investigates the relationship between acoustic parameters utilized in the formant‐based ARX (autoregressive with exogenous input) speech synthesis model (J. Acoust. Soc. Jpn., 58, 386–397) and perceived voice qualities of synthetic speech. The acoustic parameters manipulated were F0 baseline, F0 range, spectral tilt of glottal flow (TL), formant scaling parameter (FS), and speaking rate (SR). Japanese expressions associated with voice qualities were high‐pitched/low‐pitched, masculine/feminine, hoarse/clear, calm/excited, powerful/weak, youthful/elderly, thick/thin, and tense/lax (Proc. ICSLP‐98, No. 1005). A sentence utterance of an average speaker selected from a database of 109 male speakers was analyzed using the ARX method. Each of the five acoustic parameters of the utterance was manipulated at three levels, producing 243 samples of synthetic speech (3×3×3×3×3). Ten subjects evaluated the voice qualities of each of the 243 synthetic stimuli with regard to the eight Japanese expressions. Multiregression analysis showed that F0 range, F0 baseline, and FS were primary acoustic correlates of high‐pitched/low‐pitched and masculine/feminine, SR and F0 range for calm/excited, and F0 range, SR and F0 baseline for thick/thin. Significant relations were not found for the remainder of the Japanese expressions, which was thought to be associated in part with irregularities of glottal flow. |