Zobrazeno 1 - 10
of 94
pro vyhledávání: '"Lei, Shun"'
Autor:
Xu, Yaoxun, Chen, Hangting, Yu, Jianwei, Tan, Wei, Gu, Rongzhi, Lei, Shun, Lin, Zhiwei, Wu, Zhiyong
Music codecs are a vital aspect of audio codec research, and ultra low-bitrate compression holds significant importance for music transmission and generation. Due to the complexity of music backgrounds and the richness of vocals, solely relying on mo
Externí odkaz:
http://arxiv.org/abs/2409.13216
Autor:
Gao, Shuochen, Lei, Shun, Zhuo, Fan, Liu, Hangyu, Liu, Feng, Tang, Boshi, Huang, Qiaochu, Kang, Shiyin, Wu, Zhiyong
The Song Generation task aims to synthesize music composed of vocals and accompaniment from given lyrics. While the existing method, Jukebox, has explored this task, its constrained control over the generations often leads to deficiency in music perf
Externí odkaz:
http://arxiv.org/abs/2409.06307
Autor:
Lei, Shun, Zhou, Yixuan, Tang, Boshi, Lam, Max W. Y., Liu, Feng, Liu, Hangyu, Wu, Jingcheng, Kang, Shiyin, Wu, Zhiyong, Meng, Helen
Music is an integral part of human culture, embodying human intelligence and creativity, of which songs compose an essential part. While various aspects of song generation have been explored by previous works, such as singing voice, vocal composition
Externí odkaz:
http://arxiv.org/abs/2409.06029
Autor:
Zhou, Yixuan, Qin, Xiaoyu, Jin, Zeyu, Zhou, Shuoyi, Lei, Shun, Zhou, Songtao, Wu, Zhiyong, Jia, Jia
Recent AIGC systems possess the capability to generate digital multimedia content based on human language instructions, such as text, image and video. However, when it comes to speech, existing methods related to human instruction-to-speech generatio
Externí odkaz:
http://arxiv.org/abs/2408.15676
Autor:
Ma, Yinghao, Øland, Anders, Ragni, Anton, Del Sette, Bleiz MacSen, Saitis, Charalampos, Donahue, Chris, Lin, Chenghua, Plachouras, Christos, Benetos, Emmanouil, Shatri, Elona, Morreale, Fabio, Zhang, Ge, Fazekas, György, Xia, Gus, Zhang, Huan, Manco, Ilaria, Huang, Jiawen, Guinot, Julien, Lin, Liwei, Marinelli, Luca, Lam, Max W. Y., Sharma, Megha, Kong, Qiuqiang, Dannenberg, Roger B., Yuan, Ruibin, Wu, Shangda, Wu, Shih-Lun, Dai, Shuqi, Lei, Shun, Kang, Shiyin, Dixon, Simon, Chen, Wenhu, Huang, Wenhao, Du, Xingjian, Qu, Xingwei, Tan, Xu, Li, Yizhi, Tian, Zeyue, Wu, Zhiyong, Wu, Zhizheng, Ma, Ziyang, Wang, Ziyu
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models
Externí odkaz:
http://arxiv.org/abs/2408.14340
This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the s
Externí odkaz:
http://arxiv.org/abs/2404.16619
Graph neural networks (GNNs) have exhibited impressive performance in modeling graph data as exemplified in various applications. Recently, the GNN calibration problem has attracted increasing attention, especially in cost-sensitive scenarios. Previo
Externí odkaz:
http://arxiv.org/abs/2312.11858
Autor:
Chen, Liyang, Bao, Weihong, Lei, Shun, Tang, Boshi, Wu, Zhiyong, Kang, Shiyin, Huang, Haozhi, Meng, Helen
Speech-driven 3D facial animation aims at generating facial movements that are synchronized with the driving speech, which has been widely explored recently. Existing works mostly neglect the person-specific talking style in generation, including fac
Externí odkaz:
http://arxiv.org/abs/2310.07236
Autor:
Lei, Shun, Zhou, Yixuan, Chen, Liyang, Luo, Dan, Wu, Zhiyong, Wu, Xixin, Kang, Shiyin, Jiang, Tao, Zhou, Yahui, Han, Yuxing, Meng, Helen
Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-base
Externí odkaz:
http://arxiv.org/abs/2309.11977
Autor:
Zhou, Shaohuan, Lei, Shun, You, Weiya, Tuo, Deyi, You, Yuren, Wu, Zhiyong, Kang, Shiyin, Meng, Helen
This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Ba
Externí odkaz:
http://arxiv.org/abs/2308.16836