Výsledky vyhledávání

Report

Improving Audio Generation with Visual Enhanced Caption

Autor: Yuan, Yi, Jia, Dongya, Zhuang, Xiaobin, Chen, Yuanzhe, Liu, Zhengxi, Chen, Zhuo, Wang, Yuping, Wang, Yuxuan, Liu, Xubo, Plumbley, Mark D., Wang, Wenwu

Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the low qu

Externí odkaz: http://arxiv.org/abs/2407.04416

Zobrazit plný text záznamu

Report

Learning Retrieval Augmentation for Personalized Dialogue Generation

Autor: Huang, Qiushi, Fu, Shuai, Liu, Xubo, Wang, Wenwu, Ko, Tom, Zhang, Yu, Tang, Lilian

Personalized dialogue generation, focusing on generating highly tailored responses by leveraging persona profiles and dialogue context, has gained significant attention in conversational AI applications. However, persona profiles, a prevalent setting

Externí odkaz: http://arxiv.org/abs/2406.18847

Zobrazit plný text záznamu

Report

Selective Prompting Tuning for Personalized Conversations with LLMs

Autor: Huang, Qiushi, Liu, Xubo, Ko, Tom, Wu, Bo, Wang, Wenwu, Zhang, Yu, Tang, Lilian

In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we firs

Externí odkaz: http://arxiv.org/abs/2406.18187

Zobrazit plný text záznamu

Report

Text-Queried Target Sound Event Localization

Autor: Zhao, Jinzheng, Qian, Xinyuan, Xu, Yong, Liu, Haohe, Cao, Yin, Berghi, Davide, Wang, Wenwu

Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classe

Externí odkaz: http://arxiv.org/abs/2406.16058

Zobrazit plný text záznamu

Report

Fish Tracking, Counting, and Behaviour Analysis in Digital Aquaculture: A Comprehensive Review

Autor: Cui, Meng, Liu, Xubo, Liu, Haohe, Zhao, Jinzheng, Li, Daoliang, Wang, Wenwu

Digital aquaculture leverages advanced technologies and data-driven methods, providing substantial benefits over traditional aquaculture practices. Fish tracking, counting, and behaviour analysis are crucial components of digital aquaculture, which a

Externí odkaz: http://arxiv.org/abs/2406.17800

Zobrazit plný text záznamu

Report

Impact of the Top SiO2 Interlayer Thickness on Memory Window of Si Channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) Gate Structure

Autor: Hu, Tao, Shao, Xianzhou, Bai, Mingkai, Jia, Xinpei, Dai, Saifei, Sun, Xiaoqing, Han, Runhao, Yang, Jia, Ke, Xiaoyu, Tian, Fengbin, Yang, Shuai, Chai, Junshuai, Xu, Hao, Wang, Xiaolei, Wang, Wenwu, Ye, Tianchun

We study the impact of top SiO2 interlayer thickness on the memory window (MW) of Si channel ferroelectric field-effect transistor (FeFET) with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. We find that the MW increases with the increasing th

Externí odkaz: http://arxiv.org/abs/2406.15478

Zobrazit plný text záznamu

Report

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Autor: Zhang, Yiming, Xu, Xuenan, Du, Ruoyi, Liu, Haohe, Dong, Yuan, Tan, Zheng-Hua, Wang, Wenwu, Ma, Zhanyu

In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations.

Externí odkaz: http://arxiv.org/abs/2406.06295

Zobrazit plný text záznamu

Report

Soundscape Captioning using Sound Affective Quality Network and Large Language Model

Autor: Hou, Yuanbo, Ren, Qiaoqiao, Mitchell, Andrew, Wang, Wenwu, Kang, Jian, Belpaeme, Tony, Botteldooren, Dick

We live in a rich and varied acoustic world, which is experienced by individuals or communities as a soundscape. Computational auditory scene analysis, disentangling acoustic scenes by detecting and classifying events, focuses on objective attributes

Externí odkaz: http://arxiv.org/abs/2406.05914

Zobrazit plný text záznamu

Report

Regime Learning for Differentiable Particle Filters

Autor: Brady, John-Joseph, Luo, Yuhui, Wang, Wenwu, Elvira, Victor, Li, Yunpeng

Differentiable particle filters are an emerging class of models that combine sequential Monte Carlo techniques with the flexibility of neural networks to perform state space inference. This paper concerns the case where the system may switch between

Externí odkaz: http://arxiv.org/abs/2405.04865

Zobrazit plný text záznamu

Report

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

Autor: Liu, Haohe, Xu, Xuenan, Yuan, Yi, Wu, Mengyue, Wang, Wenwu, Plumbley, Mark D.

Large language models (LLMs) have significantly advanced audio processing through audio codecs that convert audio into discrete tokens, enabling the application of language modelling techniques to audio data. However, traditional codecs often operate

Externí odkaz: http://arxiv.org/abs/2405.00233

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání