Zobrazeno 1 - 10
of 1 673
pro vyhledávání: '"statistical parametric speech synthesis"'
Autor:
Cheng, Pengyu, Ling, Zhenhua
In this paper, we propose a method of speaker adaption with intuitive prosodic features for statistical parametric speech synthesis. The intuitive prosodic features employed in this method include pitch, pitch range, speech rate and energy considerin
Externí odkaz:
http://arxiv.org/abs/2203.00951
In recent years, statistical parametric speech synthesis (SPSS) systems have been widely utilized in many interactive speech-based systems (e.g.~Amazon's Alexa, Bose's headphones). To select a suitable SPSS system, both speech quality and performance
Externí odkaz:
http://arxiv.org/abs/2005.12962
Neural waveform models such as WaveNet have demonstrated better performance than conventional vocoders for statistical parametric speech synthesis. As an autoregressive (AR) model, WaveNet is limited by a slow sequential waveform generation process.
Externí odkaz:
http://arxiv.org/abs/1904.12088
Autor:
Ai, Yang, Ling, Zhen-Hua
This paper presents a neural vocoder named HiNet which reconstructs speech waveforms from acoustic features by predicting amplitude and phase spectra hierarchically. Different from existing neural vocoders such as WaveNet, SampleRNN and WaveRNN which
Externí odkaz:
http://arxiv.org/abs/1906.09573
Publikováno v:
Interspeech-2017
Recent studies have shown that text-to-speech synthesis quality can be improved by using glottal vocoding. This refers to vocoders that parameterize speech into two parts, the glottal excitation and vocal tract, that occur in the human speech product
Externí odkaz:
http://arxiv.org/abs/1903.05955
Neural networks with Auto-regressive structures, such as Recurrent Neural Networks (RNNs), have become the most appealing structures for acoustic modeling of parametric text to speech synthesis (TTS) in ecent studies. Despite the prominent capacity t
Externí odkaz:
http://arxiv.org/abs/1811.12208
Output from statistical parametric speech synthesis (SPSS) remains noticeably worse than natural speech recordings in terms of quality, naturalness, speaker similarity, and intelligibility in noise. There are many hypotheses regarding the origins of
Externí odkaz:
http://arxiv.org/abs/1807.10941
Autor:
Gu, Yu, Kang, Yongguo
This paper introduces an improved generative model for statistical parametric speech synthesis (SPSS) based on WaveNet under a multi-task learning framework. Different from the original WaveNet model, the proposed Multi-task WaveNet employs the frame
Externí odkaz:
http://arxiv.org/abs/1806.08619
Autor:
Reddy, M Kiran, Rao, K Sreenivasa
Publikováno v:
In Computer Speech & Language March 2020 60