Výsledky vyhledávání - "Kong, Qiuqiang"

Report

MusicScore: A Dataset for Music Score Modeling and Generation

Autor: Lin, Yuheng, Dai, Zheqi, Kong, Qiuqiang

Music scores are written representations of music and contain rich information about musical components. The visual information on music scores includes notes, rests, staff lines, clefs, dynamics, and articulations. This visual information in music s

Externí odkaz: http://arxiv.org/abs/2406.11462

Zobrazit plný text záznamu

Report

Towards Out-of-Distribution Detection in Vocoder Recognition via Latent Feature Reconstruction

Autor: Du, Renmingyue, Yao, Jixun, Kong, Qiuqiang, Cao, Yin

Advancements in synthesized speech have created a growing threat of impersonation, making it crucial to develop deepfake algorithm recognition. One significant aspect is out-of-distribution (OOD) detection, which has gained notable attention due to i

Externí odkaz: http://arxiv.org/abs/2406.02233

Zobrazit plný text záznamu

Report

WavCraft: Audio Editing and Generation with Large Language Models

Autor: Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang

Externí odkaz: http://arxiv.org/abs/2403.09527

Zobrazit plný text záznamu

Report

Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection

Autor: Hu, Jinbo, Cao, Yin, Wu, Ming, Kong, Qiuqiang, Yang, Feiran, Plumbley, Mark D., Yang, Jun

Environment shifts and conflicts present significant challenges for learning-based sound event localization and detection (SELD) methods. SELD systems, when trained in particular acoustic settings, often show restricted generalization capabilities fo

Externí odkaz: http://arxiv.org/abs/2312.16422

Zobrazit plný text záznamu

Report

Joint Music and Language Attention Models for Zero-shot Music Tagging

Autor: Du, Xingjian, Yu, Zhesong, Lin, Jiaju, Zhu, Bilei, Kong, Qiuqiang

Music tagging is a task to predict the tags of music recordings. However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags. In this work, we propose a zero-shot music tagging

Externí odkaz: http://arxiv.org/abs/2310.10159

Zobrazit plný text záznamu

Report

MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

Autor: Li, Dichucheng, Ma, Yinghao, Wei, Weixing, Kong, Qiuqiang, Wu, Yulun, Che, Mingjin, Xia, Fan, Benetos, Emmanouil, Li, Wei

Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. In this paper, we propose to

Externí odkaz: http://arxiv.org/abs/2310.09853

Zobrazit plný text záznamu

Report

Transformer-based Autoencoder with ID Constraint for Unsupervised Anomalous Sound Detection

Autor: Guan, Jian, Liu, Youde, Kong, Qiuqiang, Xiao, Feiyang, Zhu, Qiaoxi, Tian, Jiantong, Wang, Wenwu

Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two mainstream methods. However, the AE-base

Externí odkaz: http://arxiv.org/abs/2310.08950

Zobrazit plný text záznamu

Report

Music Source Separation with Band-Split RoPE Transformer

Autor: Lu, Wei-Tsung, Wang, Ju-Chiang, Kong, Qiuqiang, Hung, Yun-Ning

Music source separation (MSS) aims to separate a music recording into multiple musically distinct stems, such as vocals, bass, drums, and more. Recently, deep learning approaches such as convolutional neural networks (CNNs) and recurrent neural netwo

Externí odkaz: http://arxiv.org/abs/2309.02612

Zobrazit plný text záznamu

Report

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

Autor: Liu, Haohe, Yuan, Yi, Liu, Xubo, Mei, Xinhao, Kong, Qiuqiang, Tian, Qiao, Wang, Yuping, Wang, Wenwu, Wang, Yuxuan, Plumbley, Mark D.

Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ fr

Externí odkaz: http://arxiv.org/abs/2308.05734

Zobrazit plný text záznamu

Report

Separate Anything You Describe

Autor: Liu, Xubo, Kong, Qiuqiang, Zhao, Yan, Liu, Haohe, Yuan, Yi, Liu, Yuzhuo, Xia, Rui, Wang, Yuxuan, Plumbley, Mark D., Wang, Wenwu

Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable inter

Externí odkaz: http://arxiv.org/abs/2308.05037

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání