Popis: |
The authors describe analysis and synthesis methods for improving the quality of speech produced by D.H. Klatt's (J. Acoust. Soc. Am., vol.67, p.971-95, 1980) software formant synthesizer. Synthetic speech generated using an excitation waveform resembling the glotal volume-velocity was found to be perceptually preferred over speech synthesized using other types of excitation. In addition, listeners ranked speech tokens synthesized with an excitation waveform that simulated the effects of source-tract interaction higher in neutralness than tokens synthesized without such interaction. A series of algorithms for silent and voiced/unvoiced/mixed excitation interval classification, pitch detection, formant estimation and formant tracking was developed. The algorithms can utilize two channels of input data, i.e., speech and electroglottographic signals, and can therefore surpass the performance of single-channel (acoustic-signal-based) algorithms. The formant synthesizer was used to study some aspects of the acoustic correlates of voice quality, e.g., male/female voice conversion and the simulation of breathiness, roughness, and vocal fry. > |