Zobrazeno 1 - 10
of 555
pro vyhledávání: '"Liu,Jinglin"'
Autor:
Ye, Zhenhui, Zhong, Tianyun, Ren, Yi, Jiang, Ziyue, Huang, Jiawei, Huang, Rongjie, Liu, Jinglin, He, Jinzheng, Zhang, Chen, Wang, Zehan, Chen, Xize, Yin, Xiang, Zhao, Zhou
Talking face generation (TFG) aims to animate a target identity's face to create realistic talking videos. Personalized TFG is a variant that emphasizes the perceptual identity similarity of the synthesized result (from the perspective of appearance
Externí odkaz:
http://arxiv.org/abs/2410.06734
Autor:
Huang, Jiawei, Zhang, Chen, Ren, Yi, Jiang, Ziyue, Ye, Zhenhui, Liu, Jinglin, He, Jinzheng, Yin, Xiang, Zhao, Zhou
Voice conversion aims to modify the source speaker's voice to resemble the target speaker while preserving the original speech content. Despite notable advancements in voice conversion these days, multi-lingual voice conversion (including both monoli
Externí odkaz:
http://arxiv.org/abs/2408.04708
Autor:
Ye, Zhenhui, Zhong, Tianyun, Ren, Yi, Yang, Jiaqi, Li, Weichuang, Huang, Jiawei, Jiang, Ziyue, He, Jinzheng, Huang, Rongjie, Liu, Jinglin, Zhang, Chen, Yin, Xiang, Ma, Zejun, Zhao, Zhou
One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video. The existing methods fail to simultaneously achieve the goals of a
Externí odkaz:
http://arxiv.org/abs/2401.08503
Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Ad
Externí odkaz:
http://arxiv.org/abs/2308.15016
Autor:
Jiang, Ziyue, Liu, Jinglin, Ren, Yi, He, Jinzheng, Ye, Zhenhui, Ji, Shengpeng, Yang, Qian, Zhang, Chen, Wei, Pengfei, Wang, Chunfeng, Yin, Xiang, Ma, Zejun, Zhao, Zhou
Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skipping the fine-tuning process. However, the prompting mechanisms of zero-sh
Externí odkaz:
http://arxiv.org/abs/2307.07218
Publikováno v:
COMPEL -The international journal for computation and mathematics in electrical and electronic engineering, 2024, Vol. 43, Issue 5, pp. 1067-1079.
Autor:
Jiang, Ziyue, Ren, Yi, Ye, Zhenhui, Liu, Jinglin, Zhang, Chen, Yang, Qian, Ji, Shengpeng, Huang, Rongjie, Wang, Chunfeng, Yin, Xiang, Ma, Zejun, Zhao, Zhou
Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in achieving timbre and speech style generalization, particularly in zero-shot TTS. However, previous works usually encode speech into latent using audio codec
Externí odkaz:
http://arxiv.org/abs/2306.03509
Autor:
Ye, Zhenhui, Jiang, Ziyue, Ren, Yi, Liu, Jinglin, Zhang, Chen, Yin, Xiang, Ma, Zejun, Zhao, Zhou
We are interested in a novel task, namely low-resource text-to-talking avatar. Given only a few-minute-long talking person video with the audio track as the training data and arbitrary texts as the driving input, we aim to synthesize high-quality tal
Externí odkaz:
http://arxiv.org/abs/2306.03504
Autor:
Huang, Jiawei, Ren, Yi, Huang, Rongjie, Yang, Dongchao, Ye, Zhenhui, Zhang, Chen, Liu, Jinglin, Yin, Xiang, Ma, Zejun, Zhao, Zhou
Large diffusion models have been successful in text-to-audio (T2A) synthesis tasks, but they often suffer from common issues such as semantic misalignment and poor temporal consistency due to limited natural language understanding and data scarcity.
Externí odkaz:
http://arxiv.org/abs/2305.18474
Autor:
Huang, Rongjie, Liu, Huadai, Cheng, Xize, Ren, Yi, Li, Linjun, Ye, Zhenhui, He, Jinzheng, Zhang, Lichao, Liu, Jinglin, Yin, Xiang, Zhao, Zhou
Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date. Despite the recent success, current S2ST models still suffer from distinct degradation in noisy envir
Externí odkaz:
http://arxiv.org/abs/2305.15403