A Log Domain Pulse Model for Parametric Speech Synthesis
Autor: | Pierre Lanchantin, Mark J. F. Gales, Gilles Degottex |
---|---|
Přispěvatelé: | Gales, Mark [0000-0002-5311-8219], Apollo - University of Cambridge Repository |
Rok vydání: | 2018 |
Předmět: |
Acoustics and Ultrasonics
Computer science Speech recognition speech pulse model Binary number Speech synthesis 02 engineering and technology computer.software_genre 030507 speech-language pathology & audiology 03 medical and health sciences speech synthesis acoustic model 0202 electrical engineering electronic engineering information engineering Computer Science (miscellaneous) Electrical and Electronic Engineering text-to-speech Representation (mathematics) Parametric statistics speech processing Acoustic model voice 020206 networking & telecommunications Speech processing Computational Mathematics Noise parametric speech synthesis 0305 other medical science computer Degradation (telecommunications) |
Zdroj: | IEEE/ACM Transactions on Audio, Speech, and Language Processing |
DOI: | 10.17863/cam.21315 |
Popis: | Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the form of the vocoder. One of the main causes of degradation is the reconstruction of the noise. In this article, a new signal model is proposed that leads to a simple synthesizer, without the need for ad-hoc tuning of model parameters. The model is not based on the traditional additive linear source-filter model, it adopts a combination of speech components that are additive in the log domain. Also, the same representation for voiced and unvoiced segments is used, rather than relying on binary voicing decisions. This avoids voicing error discontinuities that can occur in many current vocoders. A simple binary mask is used to denote the presence of noise in the time-frequency domain, which is less sensitive to classification errors. Four experiments have been carried out to evaluate this new model. The first experiment examines the noise reconstruction issue. Three listening tests have also been carried out that demonstrate the advantages of this model: comparison with the STRAIGHT vocoder; the direct prediction of the binary noise mask by using a mixed output configuration; and partial improvements of creakiness using a mask correction mechanism. European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie; 10.13039/501100000266-EPSRC |
Databáze: | OpenAIRE |
Externí odkaz: |