Výsledky vyhledávání

Report

Improving Audio Generation with Visual Enhanced Caption

Autor: Yuan, Yi, Jia, Dongya, Zhuang, Xiaobin, Chen, Yuanzhe, Liu, Zhengxi, Chen, Zhuo, Wang, Yuping, Wang, Yuxuan, Liu, Xubo, Plumbley, Mark D., Wang, Wenwu

Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low qu

Externí odkaz: http://arxiv.org/abs/2407.04416

Zobrazit plný text záznamu

Report

Learning Retrieval Augmentation for Personalized Dialogue Generation

Autor: Huang, Qiushi, Fu, Shuai, Liu, Xubo, Wang, Wenwu, Ko, Tom, Zhang, Yu, Tang, Lilian

Personalized dialogue generation, focusing on generating highly tailored responses by leveraging persona profiles and dialogue context, has gained significant attention in conversational AI applications. However, persona profiles, a prevalent setting

Externí odkaz: http://arxiv.org/abs/2406.18847

Zobrazit plný text záznamu

Report

Selective Prompting Tuning for Personalized Conversations with LLMs

Autor: Huang, Qiushi, Liu, Xubo, Ko, Tom, Wu, Bo, Wang, Wenwu, Zhang, Yu, Tang, Lilian

In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we firs

Externí odkaz: http://arxiv.org/abs/2406.18187

Zobrazit plný text záznamu

Report

Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review

Autor: Cui, Meng, Liu, Xubo, Liu, Haohe, Zhao, Jinzheng, Li, Daoliang, Wang, Wenwu

Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which a

Externí odkaz: http://arxiv.org/abs/2406.17800

Zobrazit plný text záznamu

Report

Dynamic Identity-Guided Attention Network for Visible-Infrared Person Re-identification

Autor: Gao, Peng, Lee, Yujian, Zhang, Hui, Liu, Xubo, Hu, Yiyang, Jing, Guquan

Visible-infrared person re-identification (VI-ReID) aims to match people with the same identity between visible and infrared modalities. VI-ReID is a challenging task due to the large differences in individual appearance under different modalities. E

Externí odkaz: http://arxiv.org/abs/2405.12713

Zobrazit plný text záznamu

Report

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Autor: Deng, Qixin, Yang, Qikai, Yuan, Ruibin, Huang, Yipeng, Wang, Yi, Liu, Xubo, Tian, Zeyue, Pan, Jiahao, Zhang, Ge, Lin, Hanfeng, Li, Yizhi, Ma, Yinghao, Fu, Jie, Lin, Chenghua, Benetos, Emmanouil, Wang, Wenwu, Xia, Guangyu, Xue, Wei, Guo, Yike

Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM

Externí odkaz: http://arxiv.org/abs/2404.18081

Zobrazit plný text záznamu

Report

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Autor: Yuan, Yi, Chen, Zhuo, Liu, Xubo, Liu, Haohe, Xu, Xuenan, Jia, Dongya, Chen, Yuanzhe, Plumbley, Mark D., Wang, Wenwu

Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal informati

Externí odkaz: http://arxiv.org/abs/2404.17806

Zobrazit plný text záznamu

Report

WavCraft: Audio Editing and Generation with Large Language Models

Autor: Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang

Externí odkaz: http://arxiv.org/abs/2403.09527

Zobrazit plný text záznamu

Report

Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities

Autor: Liang, Jinhua, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil

The auditory system plays a substantial role in shaping the overall human perceptual experience. While prevailing large language models (LLMs) and visual language models (VLMs) have shown their promise in solving a wide variety of vision and language

Externí odkaz: http://arxiv.org/abs/2312.00249

Zobrazit plný text záznamu

Report

First-Shot Unsupervised Anomalous Sound Detection With Unknown Anomalies Estimated by Metadata-Assisted Audio Generation

Autor: Zhang, Hejing, Zhu, Qiaoxi, Guan, Jian, Liu, Haohe, Xiao, Feiyang, Tian, Jiantong, Mei, Xinhao, Liu, Xubo, Wang, Wenwu

First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availabilit

Externí odkaz: http://arxiv.org/abs/2310.14173

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání