Výsledky vyhledávání

Report

Differentiable Interacting Multiple Model Particle Filtering

Autor: Brady, John-Joseph, Luo, Yuhui, Wang, Wenwu, Elvira, Víctor, Li, Yunpeng

We propose a sequential Monte Carlo algorithm for parameter learning when the studied model exhibits random discontinuous jumps in behaviour. To facilitate the learning of high dimensional parameter sets, such as those associated to neural networks,

Externí odkaz: http://arxiv.org/abs/2410.00620

Zobrazit plný text záznamu

Report

FlowSep: Language-Queried Sound Separation with Rectified Flow Matching

Autor: Yuan, Yi, Liu, Xubo, Liu, Haohe, Plumbley, Mark D., Wang, Wenwu

Language-queried audio source separation (LASS) focuses on separating sounds using textual descriptions of the desired sources. Current methods mainly use discriminative approaches, such as time-frequency masking, to separate target sounds and minimi

Externí odkaz: http://arxiv.org/abs/2409.07614

Zobrazit plný text záznamu

Report

Efficient Audio Captioning with Encoder-Level Knowledge Distillation

Autor: Xu, Xuenan, Liu, Haohe, Wu, Mengyue, Wang, Wenwu, Plumbley, Mark D.

Significant improvement has been achieved in automated audio captioning (AAC) with recent models. However, these models have become increasingly large as their performance is enhanced. In this work, we propose a knowledge distillation (KD) framework

Externí odkaz: http://arxiv.org/abs/2407.14329

Zobrazit plný text záznamu

Report

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Autor: Zhao, Junqi, Liu, Xubo, Zhao, Jinzheng, Yuan, Yi, Kong, Qiuqiang, Plumbley, Mark D., Wang, Wenwu

Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging de

Externí odkaz: http://arxiv.org/abs/2407.11745

Zobrazit plný text záznamu

Report

A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining

Autor: Xiao, Feiyang, Guan, Jian, Zhu, Qiaoxi, Liu, Xubo, Wang, Wenbo, Qi, Shuhan, Zhang, Kejia, Sun, Jianyuan, Wang, Wenwu

Language-queried audio source separation (LASS) aims to separate an audio source guided by a text query, with the signal-to-distortion ratio (SDR)-based metrics being commonly used to objectively measure the quality of the separated audio. However, t

Externí odkaz: http://arxiv.org/abs/2407.04936

Zobrazit plný text záznamu

Report

Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions

Autor: Yuan, Yi, Jia, Dongya, Zhuang, Xiaobin, Chen, Yuanzhe, Liu, Zhengxi, Chen, Zhuo, Wang, Yuping, Wang, Yuxuan, Liu, Xubo, Kang, Xiyuan, Plumbley, Mark D., Wang, Wenwu

Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the simpli

Externí odkaz: http://arxiv.org/abs/2407.04416

Zobrazit plný text záznamu

Report

Learning Retrieval Augmentation for Personalized Dialogue Generation

Autor: Huang, Qiushi, Fu, Shuai, Liu, Xubo, Wang, Wenwu, Ko, Tom, Zhang, Yu, Tang, Lilian

Personalized dialogue generation, focusing on generating highly tailored responses by leveraging persona profiles and dialogue context, has gained significant attention in conversational AI applications. However, persona profiles, a prevalent setting

Externí odkaz: http://arxiv.org/abs/2406.18847

Zobrazit plný text záznamu

Report

Selective Prompting Tuning for Personalized Conversations with LLMs

Autor: Huang, Qiushi, Liu, Xubo, Ko, Tom, Wu, Bo, Wang, Wenwu, Zhang, Yu, Tang, Lilian

In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we firs

Externí odkaz: http://arxiv.org/abs/2406.18187

Zobrazit plný text záznamu

Report

Text-Queried Target Sound Event Localization

Autor: Zhao, Jinzheng, Qian, Xinyuan, Xu, Yong, Liu, Haohe, Cao, Yin, Berghi, Davide, Wang, Wenwu

Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classe

Externí odkaz: http://arxiv.org/abs/2406.16058

Zobrazit plný text záznamu

Report

Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review

Autor: Cui, Meng, Liu, Xubo, Liu, Haohe, Zhao, Jinzheng, Li, Daoliang, Wang, Wenwu

Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which a

Externí odkaz: http://arxiv.org/abs/2406.17800

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání