Výsledky vyhledávání

Report

Text-Queried Target Sound Event Localization

Autor: Zhao, Jinzheng, Qian, Xinyuan, Xu, Yong, Liu, Haohe, Cao, Yin, Berghi, Davide, Wang, Wenwu

Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classe

Externí odkaz: http://arxiv.org/abs/2406.16058

Zobrazit plný text záznamu

Report

Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review

Autor: Cui, Meng, Liu, Xubo, Liu, Haohe, Zhao, Jinzheng, Li, Daoliang, Wang, Wenwu

Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which a

Externí odkaz: http://arxiv.org/abs/2406.17800

Zobrazit plný text záznamu

Report

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Autor: Zhang, Yiming, Xu, Xuenan, Du, Ruoyi, Liu, Haohe, Dong, Yuan, Tan, Zheng-Hua, Wang, Wenwu, Ma, Zhanyu

In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations.

Externí odkaz: http://arxiv.org/abs/2406.06295

Zobrazit plný text záznamu

Report

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Autor: Liu, Haohe, Xu, Xuenan, Yuan, Yi, Wu, Mengyue, Wang, Wenwu, Plumbley, Mark D.

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate

Externí odkaz: http://arxiv.org/abs/2405.00233

Zobrazit plný text záznamu

Report

T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

Autor: Yuan, Yi, Chen, Zhuo, Liu, Xubo, Liu, Haohe, Xu, Xuenan, Jia, Dongya, Chen, Yuanzhe, Plumbley, Mark D., Wang, Wenwu

Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal informati

Externí odkaz: http://arxiv.org/abs/2404.17806

Zobrazit plný text záznamu

Report

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Autor: Ye, Zhen, Ju, Zeqian, Liu, Haohe, Tan, Xu, Chen, Jianyi, Lu, Yiwen, Sun, Peiwen, Pan, Jiahao, Bian, Weizhen, He, Shulin, Liu, Qifeng, Guo, Yike, Xue, Wei

Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using

Externí odkaz: http://arxiv.org/abs/2404.14700

Zobrazit plný text záznamu

Report

WavCraft: Audio Editing and Generation with Large Language Models

Autor: Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil

We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang

Externí odkaz: http://arxiv.org/abs/2403.09527

Zobrazit plný text záznamu

Report

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

Autor: Bai, Jisheng, Wang, Mou, Liu, Haohe, Yin, Han, Jia, Yafei, Huang, Siwei, Du, Yutong, Zhang, Dongzhe, Shi, Dongyuan, Gan, Woon-Seng, Plumbley, Mark D., Rahardja, Susanto, Xiang, Bin, Chen, Jianfeng

Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift betw

Externí odkaz: http://arxiv.org/abs/2402.02694

Zobrazit plný text záznamu

Report

Balanced SNR-Aware Distillation for Guided Text-to-Audio Generation

Autor: Liu, Bingzhi, Cao, Yin, Liu, Haohe, Zhou, Yi

Diffusion models have demonstrated promising results in text-to-audio generation tasks. However, their practical usability is hindered by slow sampling speeds, limiting their applicability in high-throughput scenarios. To address this challenge, prog

Externí odkaz: http://arxiv.org/abs/2312.15628

Zobrazit plný text záznamu

Report

First-Shot Unsupervised Anomalous Sound Detection With Unknown Anomalies Estimated by Metadata-Assisted Audio Generation

Autor: Zhang, Hejing, Zhu, Qiaoxi, Guan, Jian, Liu, Haohe, Xiao, Feiyang, Tian, Jiantong, Mei, Xinhao, Liu, Xubo, Wang, Wenwu

First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availabilit

Externí odkaz: http://arxiv.org/abs/2310.14173

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání