Výsledky vyhledávání - "Liang, Jinhua"

Report

From Audio Encoders to Piano Judges: Benchmarking Performance Understanding for Solo Piano

Autor: Zhang, Huan, Liang, Jinhua, Dixon, Simon

Our study investigates an approach for understanding musical performances through the lens of audio encoding models, focusing on the domain of solo Western classical piano music. Compared to composition-level attribute understanding such as key or ge

Externí odkaz: http://arxiv.org/abs/2407.04518

Zobrazit plný text záznamu

Report

DExter: Learning and Controlling Performance Expression with Diffusion Models

Autor: Zhang, Huan, Chowdhury, Shreyan, Cancino-Chacón, Carlos Eduardo, Liang, Jinhua, Dixon, Simon, Widmer, Gerhard

In the pursuit of developing expressive music performance models using artificial intelligence, this paper introduces DExter, a new approach leveraging diffusion probabilistic models to render Western classical piano performances. In this approach, p

Externí odkaz: http://arxiv.org/abs/2406.14850

Zobrazit plný text záznamu

Report

Mind the Domain Gap: a Systematic Analysis on Bioacoustic Sound Event Detection

Autor: Liang, Jinhua, Nolasco, Ines, Ghani, Burooj, Phan, Huy, Benetos, Emmanouil, Stowell, Dan

Detecting the presence of animal vocalisations in nature is essential to study animal populations and their behaviors. A recent development in the field is the introduction of the task known as few-shot bioacoustic sound event detection, which aims t

Externí odkaz: http://arxiv.org/abs/2403.18638

Zobrazit plný text záznamu

Report

WavCraft: Audio Editing and Generation with Large Language Models

Autor: Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang

Externí odkaz: http://arxiv.org/abs/2403.09527

Zobrazit plný text záznamu

Report

Acoustic Prompt Tuning: Empowering Large Language Models with Audition Capabilities

Autor: Liang, Jinhua, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil

The auditory system plays a substantial role in shaping the overall human perceptual experience. While prevailing large language models (LLMs) and visual language models (VLMs) have shown their promise in solving a wide variety of vision and language

Externí odkaz: http://arxiv.org/abs/2312.00249

Zobrazit plný text záznamu

Report

WavJourney: Compositional Audio Creation with Large Language Models

Autor: Liu, Xubo, Zhu, Zhongkai, Liu, Haohe, Yuan, Yi, Cui, Meng, Huang, Qiushi, Liang, Jinhua, Cao, Yin, Kong, Qiuqiang, Plumbley, Mark D., Wang, Wenwu

Large Language Models (LLMs) have shown great promise in integrating diverse expert models to tackle intricate language and vision tasks. Despite their significance in advancing the field of Artificial Intelligence Generated Content (AIGC), their pot

Externí odkaz: http://arxiv.org/abs/2307.14335

Zobrazit plný text záznamu

Report

Adapting Language-Audio Models as Few-Shot Audio Learners

Autor: Liang, Jinhua, Liu, Xubo, Liu, Haohe, Phan, Huy, Benetos, Emmanouil, Plumbley, Mark D., Wang, Wenwu

We presented the Treff adapter, a training-efficient adapter for CLAP, to boost zero-shot classification performance by making use of a small set of labelled data. Specifically, we designed CALM to retrieve the probability distribution of text-audio

Externí odkaz: http://arxiv.org/abs/2305.17719

Zobrazit plný text záznamu

Report

Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study

Autor: Yuan, Yi, Liu, Haohe, Liang, Jinhua, Liu, Xubo, Plumbley, Mark D., Wang, Wenwu

Deep neural networks have recently achieved breakthroughs in sound generation with text prompts. Despite their promising performance, current text-to-sound generation models face issues on small-scale datasets (e.g., overfitting), significantly limit

Externí odkaz: http://arxiv.org/abs/2303.03857

Zobrazit plný text záznamu

Report

Learning from Taxonomy: Multi-label Few-Shot Classification for Everyday Sound Recognition

Autor: Liang, Jinhua, Phan, Huy, Benetos, Emmanouil

Everyday sound recognition aims to infer types of sound events in audio streams. While many works succeeded in training models with high performance in a fully-supervised manner, they are still restricted to the demand of large quantities of labelled

Externí odkaz: http://arxiv.org/abs/2212.08952

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání