Výsledky vyhledávání

Report

Text-Queried Target Sound Event Localization

Autor: Zhao, Jinzheng, Qian, Xinyuan, Xu, Yong, Liu, Haohe, Cao, Yin, Berghi, Davide, Wang, Wenwu

Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classe

Externí odkaz: http://arxiv.org/abs/2406.16058

Zobrazit plný text záznamu

Report

Towards Out-of-Distribution Detection in Vocoder Recognition via Latent Feature Reconstruction

Autor: Du, Renmingyue, Yao, Jixun, Kong, Qiuqiang, Cao, Yin

Advancements in synthesized speech have created a growing threat of impersonation, making it crucial to develop deepfake algorithm recognition. One significant aspect is out-of-distribution (OOD) detection, which has gained notable attention due to i

Externí odkaz: http://arxiv.org/abs/2406.02233

Zobrazit plný text záznamu

Report

WavCraft: Audio Editing and Generation with Large Language Models

Autor: Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang

Externí odkaz: http://arxiv.org/abs/2403.09527

Zobrazit plný text záznamu

Report

EDTC: enhance depth of text comprehension in automated audio captioning

Autor: Tan, Liwen, Cao, Yin, Zhou, Yi

Modality discrepancies have perpetually posed significant challenges within the realm of Automated Audio Captioning (AAC) and across all multi-modal domains. Facilitating models in comprehending text information plays a pivotal role in establishing a

Externí odkaz: http://arxiv.org/abs/2402.17259

Zobrazit plný text záznamu

Report

Selective-Memory Meta-Learning with Environment Representations for Sound Event Localization and Detection

Autor: Hu, Jinbo, Cao, Yin, Wu, Ming, Kong, Qiuqiang, Yang, Feiran, Plumbley, Mark D., Yang, Jun

Environment shifts and conflicts present significant challenges for learning-based sound event localization and detection (SELD) methods. SELD systems, when trained in particular acoustic settings, often show restricted generalization capabilities fo

Externí odkaz: http://arxiv.org/abs/2312.16422

Zobrazit plný text záznamu

Report

Balanced SNR-Aware Distillation for Guided Text-to-Audio Generation

Autor: Liu, Bingzhi, Cao, Yin, Liu, Haohe, Zhou, Yi

Diffusion models have demonstrated promising results in text-to-audio generation tasks. However, their practical usability is hindered by slow sampling speeds, limiting their applicability in high-throughput scenarios. To address this challenge, prog

Externí odkaz: http://arxiv.org/abs/2312.15628

Zobrazit plný text záznamu

Report

META-SELD: Meta-Learning for Fast Adaptation to the new environment in Sound Event Localization and Detection

Autor: Hu, Jinbo, Cao, Yin, Wu, Ming, Yang, Feiran, Yu, Ziying, Wang, Wenwu, Plumbley, Mark D., Yang, Jun

For learning-based sound event localization and detection (SELD) methods, different acoustic environments in the training and test sets may result in large performance differences in the validation and evaluation stages. Different environments, such

Externí odkaz: http://arxiv.org/abs/2308.08847

Zobrazit plný text záznamu

Elektronická kniha

From Policemen to Revolutionaries. [electronic resource]

Autor: Cao, Yin

Externí odkaz: Kolekce e-knih KNAV Registrovani uzivatele: plny text online 5 minut, dalsi pristup na vyzadani. Registered users: full text online 5 minutes, further access on request.

Report

WavJourney: Compositional Audio Creation with Large Language Models

Autor: Liu, Xubo, Zhu, Zhongkai, Liu, Haohe, Yuan, Yi, Cui, Meng, Huang, Qiushi, Liang, Jinhua, Cao, Yin, Kong, Qiuqiang, Plumbley, Mark D., Wang, Wenwu

Large Language Models (LLMs) have shown great promise in integrating diverse expert models to tackle intricate language and vision tasks. Despite their significance in advancing the field of Artificial Intelligence Generated Content (AIGC), their pot

Externí odkaz: http://arxiv.org/abs/2307.14335

Zobrazit plný text záznamu

Report

Neural Sound Field Decomposition with Super-resolution of Sound Direction

Autor: Kong, Qiuqiang, Liu, Shilei, Shi, Junjie, Ye, Xuzhou, Cao, Yin, Zhu, Qiaoxi, Xu, Yong, Wang, Yuxuan

Sound field decomposition predicts waveforms in arbitrary directions using signals from a limited number of microphones as inputs. Sound field decomposition is fundamental to downstream tasks, including source localization, source separation, and spa

Externí odkaz: http://arxiv.org/abs/2210.12345

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání