Zobrazeno 1 - 10
of 512
pro vyhledávání: '"Plumbley, Mark D."'
Autor:
Yuan, Yi, Jia, Dongya, Zhuang, Xiaobin, Chen, Yuanzhe, Liu, Zhengxi, Chen, Zhuo, Wang, Yuping, Wang, Yuxuan, Liu, Xubo, Plumbley, Mark D., Wang, Wenwu
Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low qu
Externí odkaz:
http://arxiv.org/abs/2407.04416
Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate
Externí odkaz:
http://arxiv.org/abs/2405.00233
Autor:
Yuan, Yi, Chen, Zhuo, Liu, Xubo, Liu, Haohe, Xu, Xuenan, Jia, Dongya, Chen, Yuanzhe, Plumbley, Mark D., Wang, Wenwu
Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal informati
Externí odkaz:
http://arxiv.org/abs/2404.17806
Autor:
Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil
We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang
Externí odkaz:
http://arxiv.org/abs/2403.09527
Autor:
Bai, Jisheng, Wang, Mou, Liu, Haohe, Yin, Han, Jia, Yafei, Huang, Siwei, Du, Yutong, Zhang, Dongzhe, Shi, Dongyuan, Gan, Woon-Seng, Plumbley, Mark D., Rahardja, Susanto, Xiang, Bin, Chen, Jianfeng
Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift betw
Externí odkaz:
http://arxiv.org/abs/2402.02694
Environment shifts and conflicts present significant challenges for learning-based sound event localization and detection (SELD) methods. SELD systems, when trained in particular acoustic settings, often show restricted generalization capabilities fo
Externí odkaz:
http://arxiv.org/abs/2312.16422
The auditory system plays a substantial role in shaping the overall human perceptual experience. While prevailing large language models (LLMs) and visual language models (VLMs) have shown their promise in solving a wide variety of vision and language
Externí odkaz:
http://arxiv.org/abs/2312.00249
Despite recent progress in text-to-audio (TTA) generation, we show that the state-of-the-art models, such as AudioLDM, trained on datasets with an imbalanced class distribution, such as AudioCaps, are biased in their generation performance. Specifica
Externí odkaz:
http://arxiv.org/abs/2309.08051
Audio super-resolution is a fundamental task that predicts high-frequency components for low-resolution audio, enhancing audio quality in digital applications. Previous methods have limitations such as the limited scope of audio types (e.g., music, s
Externí odkaz:
http://arxiv.org/abs/2309.07314
Autor:
Hu, Jinbo, Cao, Yin, Wu, Ming, Yang, Feiran, Yu, Ziying, Wang, Wenwu, Plumbley, Mark D., Yang, Jun
For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages. Different environments, such
Externí odkaz:
http://arxiv.org/abs/2308.08847