Výsledky vyhledávání

Report

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

Autor: Zhou, Shaohuan, Lei, Shun, You, Weiya, Tuo, Deyi, You, Yuren, Wu, Zhiyong, Kang, Shiyin, Meng, Helen

This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Ba

Externí odkaz: http://arxiv.org/abs/2308.16836

Zobrazit plný text záznamu

Report

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

Autor: Chen, Jie, Song, Changhe, Tuo, Deyi, Wu, Xixin, Kang, Shiyin, Wu, Zhiyong, Meng, Helen

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech. Although inter-utterance linguistic information can influence the speech interpretation of the target uttera

Externí odkaz: http://arxiv.org/abs/2308.16577

Zobrazit plný text záznamu

Report

CoverHunter: Cover Song Identification with Refined Attention and Alignments

Autor: Liu, Feng, Tuo, Deyi, Xu, Yinan, Han, Xintong

Abstract: Cover song identification (CSI) focuses on finding the same music with different versions in reference anchors given a query track. In this paper, we propose a novel system named CoverHunter that overcomes the shortcomings of existing detec

Externí odkaz: http://arxiv.org/abs/2306.09025

Zobrazit plný text záznamu

Report

FullSubNet+: Channel Attention FullSubNet with Complex Spectrograms for Speech Enhancement

Autor: Chen, Jun, Wang, Zilin, Tuo, Deyi, Wu, Zhiyong, Kang, Shiyin, Meng, Helen

Previously proposed FullSubNet has achieved outstanding performance in Deep Noise Suppression (DNS) Challenge and attracted much attention. However, it still encounters issues such as input-output mismatch and coarse processing for frequency bands. I

Externí odkaz: http://arxiv.org/abs/2203.12188

Zobrazit plný text záznamu

Report

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion

Autor: Zhao, Xintao, Liu, Feng, Song, Changhe, Wu, Zhiyong, Kang, Shiyin, Tuo, Deyi, Meng, Helen

Non-parallel data voice conversion (VC) have achieved considerable breakthroughs recently through introducing bottleneck features (BNFs) extracted by the automatic speech recognition(ASR) model. However, selection of BNFs have a significant impact on

Externí odkaz: http://arxiv.org/abs/2203.12813

Zobrazit plný text záznamu

Report

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams

Autor: Huang, Huirong, Wu, Zhiyong, Kang, Shiyin, Dai, Dongyang, Jia, Jia, Fu, Tianxiao, Tuo, Deyi, Lei, Guangzhi, Liu, Peng, Su, Dan, Yu, Dong, Meng, Helen

Generating 3D speech-driven talking head has received more and more attention in recent years. Recent approaches mainly have following limitations: 1) most speaker-independent methods need handcrafted features that are time-consuming to design or unr

Externí odkaz: http://arxiv.org/abs/2006.11610

Zobrazit plný text záznamu

Report

DurIAN: Duration Informed Attention Network For Multimodal Synthesis

Autor: Yu, Chengzhu, Lu, Heng, Hu, Na, Yu, Meng, Weng, Chao, Xu, Kun, Liu, Peng, Tuo, Deyi, Kang, Shiyin, Lei, Guangzhi, Su, Dan, Yu, Dong

In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously. The key component of this system is the Duration Informed Attention Network (DurIAN), an autoregressi

Externí odkaz: http://arxiv.org/abs/1909.01700

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání