Výsledky vyhledávání

Report

Autor: Xu, Yaoxun, Chen, Hangting, Yu, Jianwei, Tan, Wei, Gu, Rongzhi, Lei, Shun, Lin, Zhiwei, Wu, Zhiyong

Music codecs are a vital aspect of audio codec research, and ultra low-bitrate compression holds significant importance for music transmission and generation. Due to the complexity of music backgrounds and the richness of vocals, solely relying on mo

Externí odkaz: http://arxiv.org/abs/2409.13216

Zobrazit plný text záznamu

Report

An End-to-End Approach for Chord-Conditioned Song Generation

Autor: Gao, Shuochen, Lei, Shun, Zhuo, Fan, Liu, Hangyu, Liu, Feng, Tang, Boshi, Huang, Qiaochu, Kang, Shiyin, Wu, Zhiyong

The Song Generation task aims to synthesize music composed of vocals and accompaniment from given lyrics. While the existing method, Jukebox, has explored this task, its constrained control over the generations often leads to deficiency in music perf

Externí odkaz: http://arxiv.org/abs/2409.06307

Zobrazit plný text záznamu

Report

SongCreator: Lyrics-based Universal Song Generation

Autor: Lei, Shun, Zhou, Yixuan, Tang, Boshi, Lam, Max W. Y., Liu, Feng, Liu, Hangyu, Wu, Jingcheng, Kang, Shiyin, Wu, Zhiyong, Meng, Helen

Music is an integral part of human culture, embodying human intelligence and creativity, of which songs compose an essential part. While various aspects of song generation have been explored by previous works, such as singing voice, vocal composition

Externí odkaz: http://arxiv.org/abs/2409.06029

Zobrazit plný text záznamu

Report

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling

Autor: Zhou, Yixuan, Qin, Xiaoyu, Jin, Zeyu, Zhou, Shuoyi, Lei, Shun, Zhou, Songtao, Wu, Zhiyong, Jia, Jia

Recent AIGC systems possess the capability to generate digital multimedia content based on human language instructions, such as text, image and video. However, when it comes to speech, existing methods related to human instruction-to-speech generatio

Externí odkaz: http://arxiv.org/abs/2408.15676

Zobrazit plný text záznamu

Report

Foundation Models for Music: A Survey

In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models

Externí odkaz: http://arxiv.org/abs/2408.14340

Zobrazit plný text záznamu

Report

The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge

Autor: Zhou, Yixuan, Zhou, Shuoyi, Lei, Shun, Wu, Zhiyong, Wu, Menglin

This paper presents the multi-speaker multi-lingual few-shot voice cloning system developed by THU-HCSI team for LIMMITS'24 Challenge. To achieve high speaker similarity and naturalness in both mono-lingual and cross-lingual scenarios, we build the s

Externí odkaz: http://arxiv.org/abs/2404.16619

Zobrazit plný text záznamu

Report

SimCalib: Graph Neural Network Calibration based on Similarity between Nodes

Autor: Tang, Boshi, Wu, Zhiyong, Wu, Xixin, Huang, Qiaochu, Chen, Jun, Lei, Shun, Meng, Helen

Graph neural networks (GNNs) have exhibited impressive performance in modeling graph data as exemplified in various applications. Recently, the GNN calibration problem has attracted increasing attention, especially in cost-sensitive scenarios. Previo

Externí odkaz: http://arxiv.org/abs/2312.11858

Zobrazit plný text záznamu

Report

AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation

Autor: Chen, Liyang, Bao, Weihong, Lei, Shun, Tang, Boshi, Wu, Zhiyong, Kang, Shiyin, Huang, Haozhi, Meng, Helen

Speech-driven 3D facial animation aims at generating facial movements that are synchronized with the driving speech, which has been widely explored recently. Existing works mostly neglect the person-specific talking style in generation, including fac

Externí odkaz: http://arxiv.org/abs/2310.07236

Zobrazit plný text záznamu

Report

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Autor: Lei, Shun, Zhou, Yixuan, Chen, Liyang, Luo, Dan, Wu, Zhiyong, Wu, Xixin, Kang, Shiyin, Jiang, Tao, Zhou, Yahui, Han, Yuxing, Meng, Helen

Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-base

Externí odkaz: http://arxiv.org/abs/2309.11977

Zobrazit plný text záznamu

Report

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

Autor: Zhou, Shaohuan, Lei, Shun, You, Weiya, Tuo, Deyi, You, Yuren, Wu, Zhiyong, Kang, Shiyin, Meng, Helen

This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Ba

Externí odkaz: http://arxiv.org/abs/2308.16836

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání