Zobrazeno 1 - 10
of 77
pro vyhledávání: '"Liu, Haohe"'
Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classe
Externí odkaz:
http://arxiv.org/abs/2406.16058
Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which a
Externí odkaz:
http://arxiv.org/abs/2406.17800
Autor:
Zhang, Yiming, Xu, Xuenan, Du, Ruoyi, Liu, Haohe, Dong, Yuan, Tan, Zheng-Hua, Wang, Wenwu, Ma, Zhanyu
In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations.
Externí odkaz:
http://arxiv.org/abs/2406.06295
Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate
Externí odkaz:
http://arxiv.org/abs/2405.00233
Autor:
Yuan, Yi, Chen, Zhuo, Liu, Xubo, Liu, Haohe, Xu, Xuenan, Jia, Dongya, Chen, Yuanzhe, Plumbley, Mark D., Wang, Wenwu
Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal informati
Externí odkaz:
http://arxiv.org/abs/2404.17806
Autor:
Ye, Zhen, Ju, Zeqian, Liu, Haohe, Tan, Xu, Chen, Jianyi, Lu, Yiwen, Sun, Peiwen, Pan, Jiahao, Bian, Weizhen, He, Shulin, Liu, Qifeng, Guo, Yike, Xue, Wei
Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using
Externí odkaz:
http://arxiv.org/abs/2404.14700
Autor:
Liang, Jinhua, Zhang, Huan, Liu, Haohe, Cao, Yin, Kong, Qiuqiang, Liu, Xubo, Wang, Wenwu, Plumbley, Mark D., Phan, Huy, Benetos, Emmanouil
We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Specifically, WavCraft describes the content of raw audio materials in natural lang
Externí odkaz:
http://arxiv.org/abs/2403.09527
Autor:
Bai, Jisheng, Wang, Mou, Liu, Haohe, Yin, Han, Jia, Yafei, Huang, Siwei, Du, Yutong, Zhang, Dongzhe, Shi, Dongyuan, Gan, Woon-Seng, Plumbley, Mark D., Rahardja, Susanto, Xiang, Bin, Chen, Jianfeng
Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift betw
Externí odkaz:
http://arxiv.org/abs/2402.02694
Diffusion models have demonstrated promising results in text-to-audio generation tasks. However, their practical usability is hindered by slow sampling speeds, limiting their applicability in high-throughput scenarios. To address this challenge, prog
Externí odkaz:
http://arxiv.org/abs/2312.15628
Autor:
Zhang, Hejing, Zhu, Qiaoxi, Guan, Jian, Liu, Haohe, Xiao, Feiyang, Tian, Jiantong, Mei, Xinhao, Liu, Xubo, Wang, Wenwu
First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availabilit
Externí odkaz:
http://arxiv.org/abs/2310.14173