Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Tuo, Deyi"'
Autor:
Zhou, Shaohuan, Lei, Shun, You, Weiya, Tuo, Deyi, You, Yuren, Wu, Zhiyong, Kang, Shiyin, Meng, Helen
This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Ba
Externí odkaz:
http://arxiv.org/abs/2308.16836
For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech. Although inter-utterance linguistic information can influence the speech interpretation of the target uttera
Externí odkaz:
http://arxiv.org/abs/2308.16577
Abstract: Cover song identification (CSI) focuses on finding the same music with different versions in reference anchors given a query track. In this paper, we propose a novel system named CoverHunter that overcomes the shortcomings of existing detec
Externí odkaz:
http://arxiv.org/abs/2306.09025
Previously proposed FullSubNet has achieved outstanding performance in Deep Noise Suppression (DNS) Challenge and attracted much attention. However, it still encounters issues such as input-output mismatch and coarse processing for frequency bands. I
Externí odkaz:
http://arxiv.org/abs/2203.12188
Non-parallel data voice conversion (VC) have achieved considerable breakthroughs recently through introducing bottleneck features (BNFs) extracted by the automatic speech recognition(ASR) model. However, selection of BNFs have a significant impact on
Externí odkaz:
http://arxiv.org/abs/2203.12813
Autor:
Huang, Huirong, Wu, Zhiyong, Kang, Shiyin, Dai, Dongyang, Jia, Jia, Fu, Tianxiao, Tuo, Deyi, Lei, Guangzhi, Liu, Peng, Su, Dan, Yu, Dong, Meng, Helen
Generating 3D speech-driven talking head has received more and more attention in recent years. Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unr
Externí odkaz:
http://arxiv.org/abs/2006.11610
Autor:
Yu, Chengzhu, Lu, Heng, Hu, Na, Yu, Meng, Weng, Chao, Xu, Kun, Liu, Peng, Tuo, Deyi, Kang, Shiyin, Lei, Guangzhi, Su, Dan, Yu, Dong
In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously. The key component of this system is the Duration Informed Attention Network (DurIAN), an autoregressi
Externí odkaz:
http://arxiv.org/abs/1909.01700