Zobrazeno 1 - 10
of 241
pro vyhledávání: '"Liu, Xubo"'
Autor:
Yuan, Yi, Jia, Dongya, Zhuang, Xiaobin, Chen, Yuanzhe, Liu, Zhengxi, Chen, Zhuo, Wang, Yuping, Wang, Yuxuan, Liu, Xubo, Plumbley, Mark D., Wang, Wenwu
Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low qu
Externí odkaz:
http://arxiv.org/abs/2407.04416
Personalized dialogue generation, focusing on generating highly tailored responses by leveraging persona profiles and dialogue context, has gained significant attention in conversational AI applications. However, persona profiles, a prevalent setting
Externí odkaz:
http://arxiv.org/abs/2406.18847
In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we firs
Externí odkaz:
http://arxiv.org/abs/2406.18187
Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which a
Externí odkaz:
http://arxiv.org/abs/2406.17800
Visible-infrared person re-identification (VI-ReID) aims to match people with the same identity between visible and infrared modalities. VI-ReID is a challenging task due to the large differences in individual appearance under different modalities. E
Externí odkaz:
http://arxiv.org/abs/2405.12713
Autor:
Deng, Qixin, Yang, Qikai, Yuan, Ruibin, Huang, Yipeng, Wang, Yi, Liu, Xubo, Tian, Zeyue, Pan, Jiahao, Zhang, Ge, Lin, Hanfeng, Li, Yizhi, Ma, Yinghao, Fu, Jie, Lin, Chenghua, Benetos, Emmanouil, Wang, Wenwu, Xia, Guangyu, Xue, Wei, Guo, Yike
Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM
Externí odkaz:
http://arxiv.org/abs/2404.18081
Autor:
Yuan, Yi, Chen, Zhuo, Liu, Xubo, Liu, Haohe, Xu, Xuenan, Jia, Dongya, Chen, Yuanzhe, Plumbley, Mark D., Wang, Wenwu
Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal informati
Externí odkaz:
http://arxiv.org/abs/2404.17806
Autor:
Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil
We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang
Externí odkaz:
http://arxiv.org/abs/2403.09527
The auditory system plays a substantial role in shaping the overall human perceptual experience. While prevailing large language models (LLMs) and visual language models (VLMs) have shown their promise in solving a wide variety of vision and language
Externí odkaz:
http://arxiv.org/abs/2312.00249
Autor:
Zhang, Hejing, Zhu, Qiaoxi, Guan, Jian, Liu, Haohe, Xiao, Feiyang, Tian, Jiantong, Mei, Xinhao, Liu, Xubo, Wang, Wenwu
First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availabilit
Externí odkaz:
http://arxiv.org/abs/2310.14173