Zobrazeno 1 - 10
of 141
pro vyhledávání: '"Kong, Qiuqiang"'
Music scores are written representations of music and contain rich information about musical components. The visual information on music scores includes notes, rests, staff lines, clefs, dynamics, and articulations. This visual information in music s
Externí odkaz:
http://arxiv.org/abs/2406.11462
Advancements in synthesized speech have created a growing threat of impersonation, making it crucial to develop deepfake algorithm recognition. One significant aspect is out-of-distribution (OOD) detection, which has gained notable attention due to i
Externí odkaz:
http://arxiv.org/abs/2406.02233
Autor:
Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil
We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang
Externí odkaz:
http://arxiv.org/abs/2403.09527
Environment shifts and conflicts present significant challenges for learning-based sound event localization and detection (SELD) methods. SELD systems, when trained in particular acoustic settings, often show restricted generalization capabilities fo
Externí odkaz:
http://arxiv.org/abs/2312.16422
Music tagging is a task to predict the tags of music recordings. However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags. In this work, we propose a zero-shot music tagging
Externí odkaz:
http://arxiv.org/abs/2310.10159
Autor:
Li, Dichucheng, Ma, Yinghao, Wei, Weixing, Kong, Qiuqiang, Wu, Yulun, Che, Mingjin, Xia, Fan, Benetos, Emmanouil, Li, Wei
Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. In this paper, we propose to
Externí odkaz:
http://arxiv.org/abs/2310.09853
Autor:
Guan, Jian, Liu, Youde, Kong, Qiuqiang, Xiao, Feiyang, Zhu, Qiaoxi, Tian, Jiantong, Wang, Wenwu
Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two mainstream methods. However, the AE-base
Externí odkaz:
http://arxiv.org/abs/2310.08950
Music source separation (MSS) aims to separate a music recording into multiple musically distinct stems, such as vocals, bass, drums, and more. Recently, deep learning approaches such as convolutional neural networks (CNNs) and recurrent neural netwo
Externí odkaz:
http://arxiv.org/abs/2309.02612
Autor:
Liu, Haohe, Yuan, Yi, Liu, Xubo, Mei, Xinhao, Kong, Qiuqiang, Tian, Qiao, Wang, Yuping, Wang, Wenwu, Wang, Yuxuan, Plumbley, Mark D.
Although audio generation shares commonalities across different types of audio, such as speech, music, and sound effects, designing models for each type requires careful consideration of specific objectives and biases that can significantly differ fr
Externí odkaz:
http://arxiv.org/abs/2308.05734
Autor:
Liu, Xubo, Kong, Qiuqiang, Zhao, Yan, Liu, Haohe, Yuan, Yi, Liu, Yuzhuo, Xia, Rui, Wang, Yuxuan, Plumbley, Mark D., Wang, Wenwu
Language-queried audio source separation (LASS) is a new paradigm for computational auditory scene analysis (CASA). LASS aims to separate a target sound from an audio mixture given a natural language query, which provides a natural and scalable inter
Externí odkaz:
http://arxiv.org/abs/2308.05037