Zobrazeno 1 - 10
of 52
pro vyhledávání: '"Gu, Rongzhi"'
Autor:
Xu, Yaoxun, Chen, Hangting, Yu, Jianwei, Tan, Wei, Gu, Rongzhi, Lei, Shun, Lin, Zhiwei, Wu, Zhiyong
Music codecs are a vital aspect of audio codec research, and ultra low-bitrate compression holds significant importance for music transmission and generation. Due to the complexity of music backgrounds and the richness of vocals, solely relying on mo
Externí odkaz:
http://arxiv.org/abs/2409.13216
We introduce Gull, a generative multifunctional audio codec. Gull is a general purpose neural audio compression and decompression model which can be applied to a wide range of tasks and applications such as real-time communication, audio super-resolu
Externí odkaz:
http://arxiv.org/abs/2404.04947
Autor:
Xu, Yaoxun, Chen, Hangting, Yu, Jianwei, Huang, Qiaochu, Wu, Zhiyong, Zhang, Shixiong, Li, Guangzhi, Luo, Yi, Gu, Rongzhi
Speech emotions are crucial in human communication and are extensively used in fields like speech synthesis and natural language understanding. Most prior studies, such as speech emotion recognition, have categorized speech emotions into a fixed set
Externí odkaz:
http://arxiv.org/abs/2312.10381
Autor:
Gu, Rongzhi, Luo, Yi
We introduce region-customizable sound extraction (ReZero), a general and flexible framework for the multi-channel region-wise sound extraction (R-SE) task. R-SE task aims at extracting all active target sounds (e.g., human speech) within a specific,
Externí odkaz:
http://arxiv.org/abs/2308.16892
Echo cancellation and noise reduction are essential for full-duplex communication, yet most existing neural networks have high computational costs and are inflexible in tuning model complexity. In this paper, we introduce time-frequency dual-path com
Externí odkaz:
http://arxiv.org/abs/2308.11053
Autor:
Uhlich, Stefan, Fabbro, Giorgio, Hirano, Masato, Takahashi, Shusuke, Wichern, Gordon, Roux, Jonathan Le, Chakraborty, Dipam, Mohanty, Sharada, Li, Kai, Luo, Yi, Yu, Jianwei, Gu, Rongzhi, Solovyev, Roman, Stempkovskiy, Alexander, Habruseva, Tatiana, Sukhovei, Mikhail, Mitsufuji, Yuki
This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail
Externí odkaz:
http://arxiv.org/abs/2308.06981
Autor:
Luo, Yi, Gu, Rongzhi
Modern neural-network-based speech processing systems are typically required to be robust against reverberation, and the training of such systems thus needs a large amount of reverberant data. During the training of the systems, on-the-fly simulation
Externí odkaz:
http://arxiv.org/abs/2304.08052
Multi-channel speech separation using speaker's directional information has demonstrated significant gains over blind speech separation. However, it has two limitations. First, substantial performance degradation is observed when the coming direction
Externí odkaz:
http://arxiv.org/abs/2302.13462
Recently, frequency domain all-neural beamforming methods have achieved remarkable progress for multichannel speech separation. In parallel, the integration of time domain network structure and beamforming also gains significant attention. This study
Externí odkaz:
http://arxiv.org/abs/2212.08348
Deep speaker embeddings have shown promising results in speaker recognition, as well as in other speaker-related tasks. However, some issues are still under explored, for instance, the information encoded in these representations and their influence
Externí odkaz:
http://arxiv.org/abs/2212.07068