Výsledky vyhledávání - "Plumbley, Mark D."

Report

Improving Audio Generation with Visual Enhanced Caption

Autor: Yuan, Yi, Jia, Dongya, Zhuang, Xiaobin, Chen, Yuanzhe, Liu, Zhengxi, Chen, Zhuo, Wang, Yuping, Wang, Yuxuan, Liu, Xubo, Plumbley, Mark D., Wang, Wenwu

Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low qu

Externí odkaz: http://arxiv.org/abs/2407.04416

Zobrazit plný text záznamu

Report

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Autor: Liu, Haohe, Xu, Xuenan, Yuan, Yi, Wu, Mengyue, Wang, Wenwu, Plumbley, Mark D.

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate

Externí odkaz: http://arxiv.org/abs/2405.00233

Zobrazit plný text záznamu

Report

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Autor: Yuan, Yi, Chen, Zhuo, Liu, Xubo, Liu, Haohe, Xu, Xuenan, Jia, Dongya, Chen, Yuanzhe, Plumbley, Mark D., Wang, Wenwu

Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal informati

Externí odkaz: http://arxiv.org/abs/2404.17806

Zobrazit plný text záznamu

Report

WavCraft: Audio Editing and Generation with Large Language Models

Autor: Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang

Externí odkaz: http://arxiv.org/abs/2403.09527

Zobrazit plný text záznamu

Report

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

Autor: Bai, Jisheng, Wang, Mou, Liu, Haohe, Yin, Han, Jia, Yafei, Huang, Siwei, Du, Yutong, Zhang, Dongzhe, Shi, Dongyuan, Gan, Woon-Seng, Plumbley, Mark D., Rahardja, Susanto, Xiang, Bin, Chen, Jianfeng

Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift betw

Externí odkaz: http://arxiv.org/abs/2402.02694

Zobrazit plný text záznamu

Report

Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection

Autor: Hu, Jinbo, Cao, Yin, Wu, Ming, Kong, Qiuqiang, Yang, Feiran, Plumbley, Mark D., Yang, Jun

Environment shifts and conflicts present significant challenges for learning-based sound event localization and detection (SELD) methods. SELD systems, when trained in particular acoustic settings, often show restricted generalization capabilities fo

Externí odkaz: http://arxiv.org/abs/2312.16422

Zobrazit plný text záznamu

Report

Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities

Autor: Liang, Jinhua, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil

The auditory system plays a substantial role in shaping the overall human perceptual experience. While prevailing large language models (LLMs) and visual language models (VLMs) have shown their promise in solving a wide variety of vision and language

Externí odkaz: http://arxiv.org/abs/2312.00249

Zobrazit plný text záznamu

Report

Retrieval-Augmented Text-to-Audio Generation

Autor: Yuan, Yi, Liu, Haohe, Liu, Xubo, Huang, Qiushi, Plumbley, Mark D., Wang, Wenwu

Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such as AudioCaps, are biased in their generation performance. Specifica

Externí odkaz: http://arxiv.org/abs/2309.08051

Zobrazit plný text záznamu

Report

AudioSR: Versatile Audio Super-resolution at Scale

Autor: Liu, Haohe, Chen, Ke, Tian, Qiao, Wang, Wenwu, Plumbley, Mark D.

Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, s

Externí odkaz: http://arxiv.org/abs/2309.07314

Zobrazit plný text záznamu

Report

META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection

Autor: Hu, Jinbo, Cao, Yin, Wu, Ming, Yang, Feiran, Yu, Ziying, Wang, Wenwu, Plumbley, Mark D., Yang, Jun

For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages. Different environments, such

Externí odkaz: http://arxiv.org/abs/2308.08847

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání