Zobrazeno 1 - 10
of 52
pro vyhledávání: '"Zhu, Xinfa"'
Autor:
Guo, Dake, Yao, Jixun, Zhu, Xinfa, Xia, Kangxiang, Guo, Zhao, Zhang, Ziyu, Wang, Yao, Liu, Jie, Xie, Lei
This paper presents the NPU-HWC system submitted to the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC). Our system consists of two modules: a speech generator for Track 1 and a background audio generator for Track 2.
Externí odkaz:
http://arxiv.org/abs/2410.23815
Autor:
Ma, Linhan, Zhu, Xinfa, Lv, Yuanjun, Wang, Zhichao, Wang, Ziqian, He, Wendi, Zhou, Hongbin, Xie, Lei
Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while keeping the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling process as well
Externí odkaz:
http://arxiv.org/abs/2406.09844
Autor:
Li, Hanzhao, Xue, Liumeng, Guo, Haohan, Zhu, Xinfa, Lv, Yuanjun, Xie, Lei, Chen, Yunlin, Yin, Hao, Li, Zhifei
The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequen
Externí odkaz:
http://arxiv.org/abs/2406.07422
Recent advances in text-to-speech have significantly improved the expressiveness of synthetic speech. However, a major challenge remains in generating speech that captures the diverse styles exhibited by professional narrators in audiobooks without r
Externí odkaz:
http://arxiv.org/abs/2406.05672
Accent transfer aims to transfer an accent from a source speaker to synthetic speech in the target speaker's voice. The main challenge is how to effectively disentangle speaker timbre and accent which are entangled in speech. This paper presents a VI
Externí odkaz:
http://arxiv.org/abs/2312.16850
Language models (LMs) have shown superior performances in various speech generation tasks recently, demonstrating their powerful ability for semantic context modeling. Given the intrinsic similarity between speech generation and speech enhancement, h
Externí odkaz:
http://arxiv.org/abs/2312.09747
Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-
Externí odkaz:
http://arxiv.org/abs/2311.07179
This paper aims to build a multi-speaker expressive TTS system, synthesizing a target speaker's speech with multiple styles and emotions. To this end, we propose a novel contrastive learning-based TTS approach to transfer style and emotion across spe
Externí odkaz:
http://arxiv.org/abs/2310.17101
Language models (LMs) have recently flourished in natural language processing and computer vision, generating high-fidelity texts or images in various tasks. In contrast, the current speech generative models are still struggling regarding speech qual
Externí odkaz:
http://arxiv.org/abs/2310.07246
Zero-shot speaker cloning aims to synthesize speech for any target speaker unseen during TTS system building, given only a single speech reference of the speaker at hand. Although more practical in real applications, the current zero-shot methods sti
Externí odkaz:
http://arxiv.org/abs/2310.04004