Výsledky vyhledávání

Report

EDTC: enhance depth of text comprehension in automated audio captioning

Autor: Tan, Liwen, Cao, Yin, Zhou, Yi

Modality discrepancies have perpetually posed significant challenges within the realm of Automated Audio Captioning (AAC) and across all multi-modal domains. Facilitating models in comprehending text information plays a pivotal role in establishing a

Externí odkaz: http://arxiv.org/abs/2402.17259

Zobrazit plný text záznamu

Report

Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection

Autor: Hu, Jinbo, Cao, Yin, Wu, Ming, Kong, Qiuqiang, Yang, Feiran, Plumbley, Mark D., Yang, Jun

Environment shifts and conflicts present significant challenges for learning-based sound event localization and detection (SELD) methods. SELD systems, when trained in particular acoustic settings, often show restricted generalization capabilities fo

Externí odkaz: http://arxiv.org/abs/2312.16422

Zobrazit plný text záznamu

Report

Balanced SNR-Aware Distillation for Guided Text-to-Audio Generation

Autor: Liu, Bingzhi, Cao, Yin, Liu, Haohe, Zhou, Yi

Diffusion models have demonstrated promising results in text-to-audio generation tasks. However, their practical usability is hindered by slow sampling speeds, limiting their applicability in high-throughput scenarios. To address this challenge, prog

Externí odkaz: http://arxiv.org/abs/2312.15628

Zobrazit plný text záznamu

Report

META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection

Autor: Hu, Jinbo, Cao, Yin, Wu, Ming, Yang, Feiran, Yu, Ziying, Wang, Wenwu, Plumbley, Mark D., Yang, Jun

For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages. Different environments, such

Externí odkaz: http://arxiv.org/abs/2308.08847

Zobrazit plný text záznamu

Report

WavJourney: Compositional Audio Creation with Large Language Models

Autor: Liu, Xubo, Zhu, Zhongkai, Liu, Haohe, Yuan, Yi, Cui, Meng, Huang, Qiushi, Liang, Jinhua, Cao, Yin, Kong, Qiuqiang, Plumbley, Mark D., Wang, Wenwu

Large Language Models (LLMs) have shown great promise in integrating diverse expert models to tackle intricate language and vision tasks. Despite their significance in advancing the field of Artificial Intelligence Generated Content (AIGC), their pot

Externí odkaz: http://arxiv.org/abs/2307.14335

Zobrazit plný text záznamu

Report

Neural Sound Field Decomposition with Super-resolution of Sound Direction

Autor: Kong, Qiuqiang, Liu, Shilei, Shi, Junjie, Ye, Xuzhou, Cao, Yin, Zhu, Qiaoxi, Xu, Yong, Wang, Yuxuan

Sound field decomposition predicts waveforms in arbitrary directions using signals from a limited number of microphones as inputs. Sound field decomposition is fundamental to downstream tasks, including source localization, source separation, and spa

Externí odkaz: http://arxiv.org/abs/2210.12345

Zobrazit plný text záznamu

Report

Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation Chains

Autor: Hu, Jinbo, Cao, Yin, Wu, Ming, Kong, Qiuqiang, Yang, Feiran, Plumbley, Mark D., Yang, Jun

Sound event localization and detection (SELD) is a joint task of sound event detection and direction-of-arrival estimation. In DCASE 2022 Task 3, types of data transform from computationally generated spatial recordings to recordings of real-sound sc

Externí odkaz: http://arxiv.org/abs/2209.01802

Zobrazit plný text záznamu

Report

A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection

Autor: Hu, Jinbo, Cao, Yin, Wu, Ming, Kong, Qiuqiang, Yang, Feiran, Plumbley, Mark D., Yang, Jun

Polyphonic sound event localization and detection (SELD) aims at detecting types of sound events with corresponding temporal activities and spatial locations. In this paper, a track-wise ensemble event independent network with a novel data augmentati

Externí odkaz: http://arxiv.org/abs/2203.10228

Zobrazit plný text záznamu

Report

ARCA23K: An audio dataset for investigating open-set label noise

Autor: Iqbal, Turab, Cao, Yin, Bailey, Andrew, Plumbley, Mark D., Wang, Wenwu

The availability of audio data on sound sharing platforms such as Freesound gives users access to large amounts of annotated audio. Utilising such data for training is becoming increasingly popular, but the problem of label noise that is often preval

Externí odkaz: http://arxiv.org/abs/2109.09227

Zobrazit plný text záznamu

Report

Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation

Autor: Kong, Qiuqiang, Cao, Yin, Liu, Haohe, Choi, Keunwoo, Wang, Yuxuan

Publikováno v: International Society for Music Information Retrieval (ISMIR) 2021

Deep neural network based methods have been successfully applied to music source separation. They typically learn a mapping from a mixture spectrogram to a set of source spectrograms, all with magnitudes only. This approach has several limitations: 1

Externí odkaz: http://arxiv.org/abs/2109.05418

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání