Zobrazeno 1 - 10
of 1 810
pro vyhledávání: '"Wang, Wenwu"'
We propose a sequential Monte Carlo algorithm for parameter learning when the studied model exhibits random discontinuous jumps in behaviour. To facilitate the learning of high dimensional parameter sets, such as those associated to neural networks,
Externí odkaz:
http://arxiv.org/abs/2410.00620
Language-queried audio source separation (LASS) focuses on separating sounds using textual descriptions of the desired sources. Current methods mainly use discriminative approaches, such as time-frequency masking, to separate target sounds and minimi
Externí odkaz:
http://arxiv.org/abs/2409.07614
Significant improvement has been achieved in automated audio captioning (AAC) with recent models. However, these models have become increasingly large as their performance is enhanced. In this work, we propose a knowledge distillation (KD) framework
Externí odkaz:
http://arxiv.org/abs/2407.14329
Autor:
Zhao, Junqi, Liu, Xubo, Zhao, Jinzheng, Yuan, Yi, Kong, Qiuqiang, Plumbley, Mark D., Wang, Wenwu
Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging de
Externí odkaz:
http://arxiv.org/abs/2407.11745
Autor:
Xiao, Feiyang, Guan, Jian, Zhu, Qiaoxi, Liu, Xubo, Wang, Wenbo, Qi, Shuhan, Zhang, Kejia, Sun, Jianyuan, Wang, Wenwu
Language-queried audio source separation (LASS) aims to separate an audio source guided by a text query, with the signal-to-distortion ratio (SDR)-based metrics being commonly used to objectively measure the quality of the separated audio. However, t
Externí odkaz:
http://arxiv.org/abs/2407.04936
Autor:
Yuan, Yi, Jia, Dongya, Zhuang, Xiaobin, Chen, Yuanzhe, Liu, Zhengxi, Chen, Zhuo, Wang, Yuping, Wang, Yuxuan, Liu, Xubo, Kang, Xiyuan, Plumbley, Mark D., Wang, Wenwu
Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the simpli
Externí odkaz:
http://arxiv.org/abs/2407.04416
Personalized dialogue generation, focusing on generating highly tailored responses by leveraging persona profiles and dialogue context, has gained significant attention in conversational AI applications. However, persona profiles, a prevalent setting
Externí odkaz:
http://arxiv.org/abs/2406.18847
In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we firs
Externí odkaz:
http://arxiv.org/abs/2406.18187
Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classe
Externí odkaz:
http://arxiv.org/abs/2406.16058
Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which a
Externí odkaz:
http://arxiv.org/abs/2406.17800