Výsledky vyhledávání

Report

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

Autor: Ye, Zhenhui, Zhong, Tianyun, Ren, Yi, Jiang, Ziyue, Huang, Jiawei, Huang, Rongjie, Liu, Jinglin, He, Jinzheng, Zhang, Chen, Wang, Zehan, Chen, Xize, Yin, Xiang, Zhao, Zhou

Talking face generation (TFG) aims to animate a target identity's face to create realistic talking videos. Personalized TFG is a variant that emphasizes the perceptual identity similarity of the synthesized result (from the perspective of appearance

Externí odkaz: http://arxiv.org/abs/2410.06734

Zobrazit plný text záznamu

Report

MulliVC: Multi-lingual Voice Conversion With Cycle Consistency

Autor: Huang, Jiawei, Zhang, Chen, Ren, Yi, Jiang, Ziyue, Ye, Zhenhui, Liu, Jinglin, He, Jinzheng, Yin, Xiang, Zhao, Zhou

Voice conversion aims to modify the source speaker's voice to resemble the target speaker while preserving the original speech content. Despite notable advancements in voice conversion these days, multi-lingual voice conversion (including both monoli

Externí odkaz: http://arxiv.org/abs/2408.04708

Zobrazit plný text záznamu

Report

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Autor: Ye, Zhenhui, Zhong, Tianyun, Ren, Yi, Yang, Jiaqi, Li, Weichuang, Huang, Jiawei, Jiang, Ziyue, He, Jinzheng, Huang, Rongjie, Liu, Jinglin, Zhang, Chen, Yin, Xiang, Ma, Zejun, Zhao, Zhou

One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video. The existing methods fail to simultaneously achieve the goals of a

Externí odkaz: http://arxiv.org/abs/2401.08503

Zobrazit plný text záznamu

Report

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Autor: Ji, Longbin, Wei, Pengfei, Ren, Yi, Liu, Jinglin, Zhang, Chen, Yin, Xiang

Co-speech gesture generation is crucial for automatic digital avatar animation. However, existing methods suffer from issues such as unstable training and temporal inconsistency, particularly in generating high-fidelity and comprehensive gestures. Ad

Externí odkaz: http://arxiv.org/abs/2308.15016

Zobrazit plný text záznamu

Report

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Autor: Jiang, Ziyue, Liu, Jinglin, Ren, Yi, He, Jinzheng, Ye, Zhenhui, Ji, Shengpeng, Yang, Qian, Zhang, Chen, Wei, Pengfei, Wang, Chunfeng, Yin, Xiang, Ma, Zejun, Zhao, Zhou

Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skipping the fine-tuning process. However, the prompting mechanisms of zero-sh

Externí odkaz: http://arxiv.org/abs/2307.07218

Zobrazit plný text záznamu

Akademický článek

Design and analysis of double-permanent-magnet enhanced hybrid stepping machine with tangential and radial magnetization

Autor: Chai, Xiaobao, Liu, Jinglin, Guan, RuiZhi, Xiao, Minglang

Publikováno v: COMPEL -The international journal for computation and mathematics in electrical and electronic engineering, 2024, Vol. 43, Issue 5, pp. 1067-1079.

Externí odkaz: http://www.emeraldinsight.com/doi/10.1108/COMPEL-03-2024-0157

Zobrazit plný text záznamu

Report

Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias

Autor: Jiang, Ziyue, Ren, Yi, Ye, Zhenhui, Liu, Jinglin, Zhang, Chen, Yang, Qian, Ji, Shengpeng, Huang, Rongjie, Wang, Chunfeng, Yin, Xiang, Ma, Zejun, Zhao, Zhou

Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in achieving timbre and speech style generalization, particularly in zero-shot TTS. However, previous works usually encode speech into latent using audio codec

Externí odkaz: http://arxiv.org/abs/2306.03509

Zobrazit plný text záznamu

Report

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

Autor: Ye, Zhenhui, Jiang, Ziyue, Ren, Yi, Liu, Jinglin, Zhang, Chen, Yin, Xiang, Ma, Zejun, Zhao, Zhou

We are interested in a novel task, namely low-resource text-to-talking avatar. Given only a few-minute-long talking person video with the audio track as the training data and arbitrary texts as the driving input, we aim to synthesize high-quality tal

Externí odkaz: http://arxiv.org/abs/2306.03504

Zobrazit plný text záznamu

Report

Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation

Autor: Huang, Jiawei, Ren, Yi, Huang, Rongjie, Yang, Dongchao, Ye, Zhenhui, Zhang, Chen, Liu, Jinglin, Yin, Xiang, Ma, Zejun, Zhao, Zhou

Large diffusion models have been successful in text-to-audio (T2A) synthesis tasks, but they often suffer from common issues such as semantic misalignment and poor temporal consistency due to limited natural language understanding and data scarcity.

Externí odkaz: http://arxiv.org/abs/2305.18474

Zobrazit plný text záznamu

Report

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

Autor: Huang, Rongjie, Liu, Huadai, Cheng, Xize, Ren, Yi, Li, Linjun, Ye, Zhenhui, He, Jinzheng, Zhang, Lichao, Liu, Jinglin, Yin, Xiang, Zhao, Zhou

Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date. Despite the recent success, current S2ST models still suffer from distinct degradation in noisy envir

Externí odkaz: http://arxiv.org/abs/2305.15403

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání