Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Chen, Rilin"'
With recent advances of AIGC, video generation have gained a surge of research interest in both academia and industry (e.g., Sora). However, it remains a challenge to produce temporally aligned audio to synchronize the generated video, considering th
Externí odkaz:
http://arxiv.org/abs/2409.14709
Visual and auditory perception are two crucial ways humans experience the world. Text-to-video generation has made remarkable progress over the past year, but the absence of harmonious audio in generated video limits its broader applications. In this
Externí odkaz:
http://arxiv.org/abs/2409.08601
Autor:
Wang, Helin, Yu, Meng, Hai, Jiarui, Chen, Chen, Hu, Yuchen, Chen, Rilin, Dehak, Najim, Yu, Dong
In this paper, we introduce SSR-Speech, a neural codec autoregressive model designed for stable, safe, and robust zero-shot text-based speech editing and text-to-speech synthesis. SSR-Speech is built on a Transformer decoder and incorporates classifi
Externí odkaz:
http://arxiv.org/abs/2409.07556
Any-to-any singing voice conversion (SVC) aims to transfer a target singer's timbre to other songs using a short voice sample. However many diffusion model based any-to-any SVC methods, which have achieved impressive results, usually suffered from lo
Externí odkaz:
http://arxiv.org/abs/2408.12354
Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation. In this work, we aim to offer insigh
Externí odkaz:
http://arxiv.org/abs/2407.07464
The proliferation of deep neural networks has spawned the rapid development of acoustic echo cancellation and noise suppression, and plenty of prior arts have been proposed, which yield promising performance. Nevertheless, they rarely consider the de
Externí odkaz:
http://arxiv.org/abs/2406.11175
Any-to-any singing voice conversion (SVC) is an interesting audio editing technique, aiming to convert the singing voice of one singer into that of another, given only a few seconds of singing data. However, during the conversion process, the issue o
Externí odkaz:
http://arxiv.org/abs/2406.05325
Benefiting from the development of deep learning, text-to-speech (TTS) techniques using clean speech have achieved significant performance improvements. The data collected from real scenes often contains noise and generally needs to be denoised by sp
Externí odkaz:
http://arxiv.org/abs/2308.14553
Publikováno v:
In Applied Acoustics 5 September 2024 224