Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Cao, Yin"'
Modality discrepancies have perpetually posed significant challenges within the realm of Automated Audio Captioning (AAC) and across all multi-modal domains. Facilitating models in comprehending text information plays a pivotal role in establishing a
Externí odkaz:
http://arxiv.org/abs/2402.17259
Environment shifts and conflicts present significant challenges for learning-based sound event localization and detection (SELD) methods. SELD systems, when trained in particular acoustic settings, often show restricted generalization capabilities fo
Externí odkaz:
http://arxiv.org/abs/2312.16422
Diffusion models have demonstrated promising results in text-to-audio generation tasks. However, their practical usability is hindered by slow sampling speeds, limiting their applicability in high-throughput scenarios. To address this challenge, prog
Externí odkaz:
http://arxiv.org/abs/2312.15628
Autor:
Hu, Jinbo, Cao, Yin, Wu, Ming, Yang, Feiran, Yu, Ziying, Wang, Wenwu, Plumbley, Mark D., Yang, Jun
For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages. Different environments, such
Externí odkaz:
http://arxiv.org/abs/2308.08847
Autor:
Liu, Xubo, Zhu, Zhongkai, Liu, Haohe, Yuan, Yi, Cui, Meng, Huang, Qiushi, Liang, Jinhua, Cao, Yin, Kong, Qiuqiang, Plumbley, Mark D., Wang, Wenwu
Large Language Models (LLMs) have shown great promise in integrating diverse expert models to tackle intricate language and vision tasks. Despite their significance in advancing the field of Artificial Intelligence Generated Content (AIGC), their pot
Externí odkaz:
http://arxiv.org/abs/2307.14335
Autor:
Kong, Qiuqiang, Liu, Shilei, Shi, Junjie, Ye, Xuzhou, Cao, Yin, Zhu, Qiaoxi, Xu, Yong, Wang, Yuxuan
Sound field decomposition predicts waveforms in arbitrary directions using signals from a limited number of microphones as inputs. Sound field decomposition is fundamental to downstream tasks, including source localization, source separation, and spa
Externí odkaz:
http://arxiv.org/abs/2210.12345
Sound event localization and detection (SELD) is a joint task of sound event detection and direction-of-arrival estimation. In DCASE 2022 Task 3, types of data transform from computationally generated spatial recordings to recordings of real-sound sc
Externí odkaz:
http://arxiv.org/abs/2209.01802
Polyphonic sound event localization and detection (SELD) aims at detecting types of sound events with corresponding temporal activities and spatial locations. In this paper, a track-wise ensemble event independent network with a novel data augmentati
Externí odkaz:
http://arxiv.org/abs/2203.10228
The availability of audio data on sound sharing platforms such as Freesound gives users access to large amounts of annotated audio. Utilising such data for training is becoming increasingly popular, but the problem of label noise that is often preval
Externí odkaz:
http://arxiv.org/abs/2109.09227
Publikováno v:
International Society for Music Information Retrieval (ISMIR) 2021
Deep neural network based methods have been successfully applied to music source separation. They typically learn a mapping from a mixture spectrogram to a set of source spectrograms, all with magnitudes only. This approach has several limitations: 1
Externí odkaz:
http://arxiv.org/abs/2109.05418