Výsledky vyhledávání - "Sun, Jianyuan"

Report

A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining

Autor: Xiao, Feiyang, Guan, Jian, Zhu, Qiaoxi, Liu, Xubo, Wang, Wenbo, Qi, Shuhan, Zhang, Kejia, Sun, Jianyuan, Wang, Wenwu

Language-queried audio source separation (LASS) aims to separate an audio source guided by a text query, with the signal-to-distortion ratio (SDR)-based metrics being commonly used to objectively measure the quality of the separated audio. However, t

Externí odkaz: http://arxiv.org/abs/2407.04936

Zobrazit plný text záznamu

Report

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

Autor: Zhao, Jinzheng, Xu, Yong, Qian, Xinyuan, Berghi, Davide, Wu, Peipei, Cui, Meng, Sun, Jianyuan, Jackson, Philip J. B., Wang, Wenwu

Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visu

Externí odkaz: http://arxiv.org/abs/2310.14778

Zobrazit plný text záznamu

Report

Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

Autor: Sun, Jianyuan, Liu, Xubo, Mei, Xinhao, Kılıç, Volkan, Plumbley, Mark D., Wang, Wenwu

Automated audio captioning (AAC) which generates textual descriptions of audio content. Existing AAC models achieve good results but only use the high-dimensional representation of the encoder. There is always insufficient information learning of hig

Externí odkaz: http://arxiv.org/abs/2305.18753

Zobrazit plný text záznamu

Report

Towards Generating Diverse Audio Captions via Adversarial Training

Autor: Mei, Xinhao, Liu, Xubo, Sun, Jianyuan, Plumbley, Mark D., Wang, Wenwu

Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences. This task has attracted increasing attention and substantial progress has been made in recent years. Captions gene

Externí odkaz: http://arxiv.org/abs/2212.02033

Zobrazit plný text záznamu

Report

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Autor: Liu, Xubo, Huang, Qiushi, Mei, Xinhao, Liu, Haohe, Kong, Qiuqiang, Sun, Jianyuan, Li, Shengchen, Ko, Tom, Zhang, Yu, Tang, Lilian H., Plumbley, Mark D., Kılıç, Volkan, Wang, Wenwu

Audio captioning aims to generate text descriptions of audio clips. In the real world, many objects produce similar sounds. How to accurately recognize ambiguous sounds is a major challenge for audio captioning. In this work, inspired by inherent hum

Externí odkaz: http://arxiv.org/abs/2210.16428

Zobrazit plný text záznamu

Report

Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

Autor: Sun, Jianyuan, Liu, Xubo, Mei, Xinhao, Plumbley, Mark D., Kilic, Volkan, Wang, Wenwu

Automated audio captioning (AAC) aims to describe the content of an audio clip using simple sentences. Existing AAC methods are developed based on an encoder-decoder architecture that success is attributed to the use of a pre-trained CNN10 called PAN

Externí odkaz: http://arxiv.org/abs/2210.05037

Zobrazit plný text záznamu

Report

On Metric Learning for Audio-Text Cross-Modal Retrieval

Autor: Mei, Xinhao, Liu, Xubo, Sun, Jianyuan, Plumbley, Mark D., Wang, Wenwu

Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates given a query in another modality. Solving such cross-modal retrieval task is challenging because it not only requires learning robust feature representa

Externí odkaz: http://arxiv.org/abs/2203.15537

Zobrazit plný text záznamu

Report

Deep Neural Decision Forest for Acoustic Scene Classification

Autor: Sun, Jianyuan, Liu, Xubo, Mei, Xinhao, Zhao, Jinzheng, Plumbley, Mark D., Kılıç, Volkan, Wang, Wenwu

Acoustic scene classification (ASC) aims to classify an audio clip based on the characteristic of the recording environment. In this regard, deep learning based approaches have emerged as a useful tool for ASC problems. Conventional approaches to imp

Externí odkaz: http://arxiv.org/abs/2203.03436

Zobrazit plný text záznamu

Report

Leveraging Pre-trained BERT for Audio Captioning

Autor: Liu, Xubo, Mei, Xinhao, Huang, Qiushi, Sun, Jianyuan, Zhao, Jinzheng, Liu, Haohe, Plumbley, Mark D., Kılıç, Volkan, Wang, Wenwu

Audio captioning aims at using natural language to describe the content of an audio clip. Existing audio captioning systems are generally based on an encoder-decoder architecture, in which acoustic information is extracted by an audio encoder and the

Externí odkaz: http://arxiv.org/abs/2203.02838

Zobrazit plný text záznamu

Report

Diverse Audio Captioning via Adversarial Training

Autor: Mei, Xinhao, Liu, Xubo, Sun, Jianyuan, Plumbley, Mark D., Wang, Wenwu

Audio captioning aims at generating natural language descriptions for audio clips automatically. Existing audio captioning models have shown promising improvement in recent years. However, these models are mostly trained via maximum likelihood estima

Externí odkaz: http://arxiv.org/abs/2110.06691

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání