Zobrazeno 1 - 10
of 349
pro vyhledávání: '"Liang, Jinhua"'
Our study investigates an approach for understanding musical performances through the lens of audio encoding models, focusing on the domain of solo Western classical piano music. Compared to composition-level attribute understanding such as key or ge
Externí odkaz:
http://arxiv.org/abs/2407.04518
Autor:
Zhang, Huan, Chowdhury, Shreyan, Cancino-Chacón, Carlos Eduardo, Liang, Jinhua, Dixon, Simon, Widmer, Gerhard
In the pursuit of developing expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. In this approach, p
Externí odkaz:
http://arxiv.org/abs/2406.14850
Detecting the presence of animal vocalisations in nature is essential to study animal populations and their behaviors. A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims t
Externí odkaz:
http://arxiv.org/abs/2403.18638
Autor:
Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil
We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang
Externí odkaz:
http://arxiv.org/abs/2403.09527
The auditory system plays a substantial role in shaping the overall human perceptual experience. While prevailing large language models (LLMs) and visual language models (VLMs) have shown their promise in solving a wide variety of vision and language
Externí odkaz:
http://arxiv.org/abs/2312.00249
Autor:
Liu, Xubo, Zhu, Zhongkai, Liu, Haohe, Yuan, Yi, Cui, Meng, Huang, Qiushi, Liang, Jinhua, Cao, Yin, Kong, Qiuqiang, Plumbley, Mark D., Wang, Wenwu
Large Language Models (LLMs) have shown great promise in integrating diverse expert models to tackle intricate language and vision tasks. Despite their significance in advancing the field of Artificial Intelligence Generated Content (AIGC), their pot
Externí odkaz:
http://arxiv.org/abs/2307.14335
Autor:
Liang, Jinhua, Liu, Xubo, Liu, Haohe, Phan, Huy, Benetos, Emmanouil, Plumbley, Mark D., Wang, Wenwu
We presented the Treff adapter, a training-efficient adapter for CLAP, to boost zero-shot classification performance by making use of a small set of labelled data. Specifically, we designed CALM to retrieve the probability distribution of text-audio
Externí odkaz:
http://arxiv.org/abs/2305.17719
Deep neural networks have recently achieved breakthroughs in sound generation with text prompts. Despite their promising performance, current text-to-sound generation models face issues on small-scale datasets (e.g., overfitting), significantly limit
Externí odkaz:
http://arxiv.org/abs/2303.03857
Everyday sound recognition aims to infer types of sound events in audio streams. While many works succeeded in training models with high performance in a fully-supervised manner, they are still restricted to the demand of large quantities of labelled
Externí odkaz:
http://arxiv.org/abs/2212.08952
Autor:
Liu, Yang, Zhao, Wanqi, Xi, Yonglan, Wang, Shen, Liang, Jinhua, Zeng, Yang, Dong, Weiliang, Chen, Kequan, Jia, Honghua, Wu, Xiayuan
Publikováno v:
In Applied Energy 15 March 2024 358