Výsledky vyhledávání

Report

Video-to-Audio Generation with Fine-grained Temporal Semantics

Autor: Hu, Yuchen, Gu, Yu, Li, Chenxing, Chen, Rilin, Yu, Dong

With recent advances of AIGC, video generation have gained a surge of research interest in both academia and industry (e.g., Sora). However, it remains a challenge to produce temporally aligned audio to synchronize the generated video, considering th

Externí odkaz: http://arxiv.org/abs/2409.14709

Zobrazit plný text záznamu

Report

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

Autor: Ren, Yong, Li, Chenxing, Xu, Manjie, Liang, Wei, Gu, Yu, Chen, Rilin, Yu, Dong

Visual and auditory perception are two crucial ways humans experience the world. Text-to-video generation has made remarkable progress over the past year, but the absence of harmonious audio in generated video limits its broader applications. In this

Externí odkaz: http://arxiv.org/abs/2409.08601

Zobrazit plný text záznamu

Report

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

Autor: Wang, Helin, Yu, Meng, Hai, Jiarui, Chen, Chen, Hu, Yuchen, Chen, Rilin, Dehak, Najim, Yu, Dong

In this paper, we introduce SSR-Speech, a neural codec autoregressive model designed for stable, safe, and robust zero-shot text-based speech editing and text-to-speech synthesis. SSR-Speech is built on a Transformer decoder and incorporates classifi

Externí odkaz: http://arxiv.org/abs/2409.07556

Zobrazit plný text záznamu

Report

LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation

Autor: Chen, Shihao, Gu, Yu, Cui, Jianwei, Zhang, Jie, Chen, Rilin, Dai, Lirong

Any-to-any singing voice conversion (SVC) aims to transfer a target singer's timbre to other songs using a short voice sample. However many diffusion model based any-to-any SVC methods, which have achieved impressive results, usually suffered from lo

Externí odkaz: http://arxiv.org/abs/2408.12354

Zobrazit plný text záznamu

Report

Video-to-Audio Generation with Hidden Alignment

Autor: Xu, Manjie, Li, Chenxing, Tu, Xinyi, Ren, Yong, Chen, Rilin, Gu, Yu, Liang, Wei, Yu, Dong

Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation. In this work, we aim to offer insigh

Externí odkaz: http://arxiv.org/abs/2407.07464

Zobrazit plný text záznamu

Report

SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression

Autor: Sun, Zhihang, Li, Andong, Chen, Rilin, Zhang, Hao, Yu, Meng, Zhou, Yi, Yu, Dong

The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the de

Externí odkaz: http://arxiv.org/abs/2406.11175

Zobrazit plný text záznamu

Report

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance

Autor: Chen, Shihao, Gu, Yu, Zhang, Jie, Li, Na, Chen, Rilin, Chen, Liping, Dai, Lirong

Any-to-any singing voice conversion (SVC) is an interesting audio editing technique, aiming to convert the singing voice of one singer into that of another, given only a few seconds of singing data. However, during the conversion process, the issue o

Externí odkaz: http://arxiv.org/abs/2406.05325

Zobrazit plný text záznamu

Report

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Autor: Zhu, Qiushi, Gu, Yu, Chen, Rilin, Weng, Chao, Hu, Yuchen, Dai, Lirong, Zhang, Jie

Benefiting from the development of deep learning, text-to-speech (TTS) techniques using clean speech have achieved significant performance improvements. The data collected from real scenes often contains noise and generally needs to be denoised by sp

Externí odkaz: http://arxiv.org/abs/2308.14553

Zobrazit plný text záznamu

Akademický článek

Joint dereverberation and blind source separation using a hybrid autoregressive and convolutive transfer function-based model

Autor: Liu, Shengdong, Yang, Feiran, Chen, Rilin, Yang, Jun

Publikováno v: In Applied Acoustics 5 September 2024 224

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání