Zobrazeno 1 - 10
of 39
pro vyhledávání: '"Zhang, Yongmao"'
Recent advances in text-to-speech have significantly improved the expressiveness of synthetic speech. However, a major challenge remains in generating speech that captures the diverse styles exhibited by professional narrators in audiobooks without r
Externí odkaz:
http://arxiv.org/abs/2406.05672
Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VI
Externí odkaz:
http://arxiv.org/abs/2312.16850
Recently, text-guided content generation has received extensive attention. In this work, we explore the possibility of text description-based speaker generation, i.e., using text prompts to control the speaker generation process. Specifically, we pro
Externí odkaz:
http://arxiv.org/abs/2310.05001
Previous multilingual text-to-speech (TTS) approaches have considered leveraging monolingual speaker data to enable cross-lingual speech synthesis. However, such data-efficient approaches have ignored synthesizing emotional aspects of speech due to t
Externí odkaz:
http://arxiv.org/abs/2307.15951
Autor:
Song, Kun, lei, Yi, Chen, Peikun, Cao, Yiqing, Wei, Kun, Zhang, Yongmao, Xie, Lei, Jiang, Ning, Zhao, Guoqing
This paper describes the NPU-MSXF system for the IWSLT 2023 speech-to-speech translation (S2ST) task which aims to translate from English speech of multi-source to Chinese speech. The system is built in a cascaded manner consisting of automatic speec
Externí odkaz:
http://arxiv.org/abs/2307.04630
Style transfer TTS has shown impressive performance in recent years. However, style control is often restricted to systems built on expressive speech recordings with discrete style categories. In practical situations, users may be interested in trans
Externí odkaz:
http://arxiv.org/abs/2305.19522
This paper aims to synthesize the target speaker's speech with desired speaking style and emotion by transferring the style and emotion from reference speech recorded by other speakers. We address this challenging problem with a two-stage framework c
Externí odkaz:
http://arxiv.org/abs/2211.10568
Autor:
Zhang, Yongmao, Xue, Heyang, Li, Hanzhao, Xie, Lei, Guo, Tingwei, Zhang, Ruixiong, Gong, Caixia
End-to-end singing voice synthesis (SVS) model VISinger can achieve better performance than the typical two-stage model with fewer parameters. However, VISinger has several problems: text-to-phase problem, the end-to-end model learns the meaningless
Externí odkaz:
http://arxiv.org/abs/2211.02903
Autor:
Song, Kun, Zhang, Yongmao, Lei, Yi, Cong, Jian, Li, Hanzhao, Xie, Lei, He, Gang, Bai, Jinfeng
Recent development of neural vocoders based on the generative adversarial neural network (GAN) has shown obvious advantages of generating raw waveform conditioned on mel-spectrogram with fast inference speed and lightweight networks. Whereas, it is s
Externí odkaz:
http://arxiv.org/abs/2211.01087
In current two-stage neural text-to-speech (TTS) paradigm, it is ideal to have a universal neural vocoder, once trained, which is robust to imperfect mel-spectrogram predicted from the acoustic model. To this end, we propose Robust MelGAN vocoder by
Externí odkaz:
http://arxiv.org/abs/2210.17349