Výsledky vyhledávání - "He, JinZheng"

Report

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

Autor: Ye, Zhenhui, Zhong, Tianyun, Ren, Yi, Jiang, Ziyue, Huang, Jiawei, Huang, Rongjie, Liu, Jinglin, He, Jinzheng, Zhang, Chen, Wang, Zehan, Chen, Xize, Yin, Xiang, Zhao, Zhou

Talking face generation (TFG) aims to animate a target identity's face to create realistic talking videos. Personalized TFG is a variant that emphasizes the perceptual identity similarity of the synthesized result (from the perspective of appearance

Externí odkaz: http://arxiv.org/abs/2410.06734

Zobrazit plný text záznamu

Report

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Autor: Zhang, Yu, Jiang, Ziyue, Li, Ruiqi, Pan, Changhao, He, Jinzheng, Huang, Rongjie, Wang, Chuxin, Zhao, Zhou

Publikováno v: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1960-1975

Zero-shot singing voice synthesis (SVS) with style transfer and style control aims to generate high-quality singing voices with unseen timbres and styles (including singing method, emotion, rhythm, technique, and pronunciation) from audio and text pr

Externí odkaz: http://arxiv.org/abs/2409.15977

Zobrazit plný text záznamu

Report

SongTrans: An unified song transcription and alignment method for lyrics and notes

Autor: Wu, Siwei, He, Jinzheng, Yuan, Ruibin, Wei, Haojie, Wei, Xipin, Lin, Chenghua, Xu, Jin, Lin, Junyang

The quantity of processed data is crucial for advancing the field of singing voice synthesis. While there are tools available for lyric or note transcription tasks, they all need pre-processed data which is relatively time-consuming (e.g., vocal and

Externí odkaz: http://arxiv.org/abs/2409.14619

Zobrazit plný text záznamu

Report

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

Autor: Zhang, Yu, Pan, Changhao, Guo, Wenxiang, Li, Ruiqi, Zhu, Zhiyuan, Wang, Jialei, Xu, Wenhao, Lu, Jingyu, Hong, Zhiqing, Wang, Chuxin, Zhang, LiChao, He, Jinzheng, Jiang, Ziyue, Chen, Yuxin, Yang, Chen, Zhou, Jiecheng, Cheng, Xinyu, Zhao, Zhou

The scarcity of high-quality and multi-task singing datasets significantly hinders the development of diverse controllable and personalized singing tasks, as existing singing datasets suffer from low quality, limited diversity of languages and singer

Externí odkaz: http://arxiv.org/abs/2409.13832

Zobrazit plný text záznamu

Report

MulliVC: Multi-lingual Voice Conversion With Cycle Consistency

Autor: Huang, Jiawei, Zhang, Chen, Ren, Yi, Jiang, Ziyue, Ye, Zhenhui, Liu, Jinglin, He, Jinzheng, Yin, Xiang, Zhao, Zhou

Voice conversion aims to modify the source speaker's voice to resemble the target speaker while preserving the original speech content. Despite notable advancements in voice conversion these days, multi-lingual voice conversion (including both monoli

Externí odkaz: http://arxiv.org/abs/2408.04708

Zobrazit plný text záznamu

Report

Qwen2-Audio Technical Report

Autor: Chu, Yunfei, Xu, Jin, Yang, Qian, Wei, Haojie, Wei, Xipin, Guo, Zhifang, Leng, Yichong, Lv, Yuanjun, He, Jinzheng, Lin, Junyang, Zhou, Chang, Zhou, Jingren

We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructi

Externí odkaz: http://arxiv.org/abs/2407.10759

Zobrazit plný text záznamu

Report

Qwen2 Technical Report

This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to

Externí odkaz: http://arxiv.org/abs/2407.10671

Zobrazit plný text záznamu

Report

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Autor: Ye, Zhenhui, Zhong, Tianyun, Ren, Yi, Yang, Jiaqi, Li, Weichuang, Huang, Jiawei, Jiang, Ziyue, He, Jinzheng, Huang, Rongjie, Liu, Jinglin, Zhang, Chen, Yin, Xiang, Ma, Zejun, Zhao, Zhou

One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video. The existing methods fail to simultaneously achieve the goals of a

Externí odkaz: http://arxiv.org/abs/2401.08503

Zobrazit plný text záznamu

Report

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

Autor: Zhang, Yu, Huang, Rongjie, Li, Ruiqi, He, JinZheng, Xia, Yan, Chen, Feiyang, Duan, Xinyu, Huai, Baoxing, Zhao, Zhou

Publikováno v: Proceedings of the AAAI Conference on Artificial Intelligence, 38(17), 19597-19605. (2024)

Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples.

Externí odkaz: http://arxiv.org/abs/2312.10741

Zobrazit plný text záznamu

Report

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Autor: Jiang, Ziyue, Liu, Jinglin, Ren, Yi, He, Jinzheng, Ye, Zhenhui, Ji, Shengpeng, Yang, Qian, Zhang, Chen, Wei, Pengfei, Wang, Chunfeng, Yin, Xiang, Ma, Zejun, Zhao, Zhou

Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts, which significantly reduces the data and computation requirements for voice cloning by skipping the fine-tuning process. However, the prompting mechanisms of zero-sh

Externí odkaz: http://arxiv.org/abs/2307.07218

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání