Výsledky vyhledávání - "Li, GuangYao"

Report

Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation

Autor: Shen, Yao, Wei, Ziwei, Liu, Chunmeng, Wei, Shuming, Zhao, Qi, Zeng, Kaiyang, Li, Guangyao

The Segment Anything Model (SAM) has demonstrated strong performance in image segmentation of natural scene images. However, its effectiveness diminishes markedly when applied to specific scientific domains, such as Scanning Probe Microscope (SPM) im

Externí odkaz: http://arxiv.org/abs/2410.12562

Zobrazit plný text záznamu

Report

Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models

Autor: Feng, Tongtong, Li, Qing, Wang, Xin, Wang, Mingzi, Li, Guangyao, Zhu, Wenwu

Cross-view geo-localization in GNSS-denied environments aims to determine an unknown location by matching drone-view images with the correct geo-tagged satellite-view images from a large gallery. Recent research shows that learning discriminative ima

Externí odkaz: http://arxiv.org/abs/2408.02408

Zobrazit plný text záznamu

Report

Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

Autor: Li, Guangyao, Du, Henghui, Hu, Di

The Audio Visual Question Answering (AVQA) task aims to answer questions related to various visual objects, sounds, and their interactions in videos. Such naturally multimodal videos contain rich and complex dynamic audio-visual components, with only

Externí odkaz: http://arxiv.org/abs/2407.20693

Zobrazit plný text záznamu

Report

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

Autor: Wang, Yaoting, Sun, Peiwen, Zhou, Dongzhan, Li, Guangyao, Zhang, Honggang, Hu, Di

Traditional reference segmentation tasks have predominantly focused on silent visual scenes, neglecting the integral role of multimodal perception and interaction in human experiences. In this work, we introduce a novel task called Reference Audio-Vi

Externí odkaz: http://arxiv.org/abs/2407.10957

Zobrazit plný text záznamu

Report

Collective modes in an unconventional superconductor with $j=3/2$ fermions

Autor: Li, Guangyao, Brydon, P. M. R.

The $j = 3/2$ fermions in cubic crystals or cold atomic gases can form Cooper pairs in both singlet ($J = 0$) and unconventional quintet ($J = 2$) $s$-wave states. Our study utilizes analytical field theory to examine fluctuations in these states wit

Externí odkaz: http://arxiv.org/abs/2405.06111

Zobrazit plný text záznamu

Report

Audio-Visual Instance Segmentation

Autor: Guo, Ruohao, Ying, Xianghua, Chen, Yaru, Niu, Dantong, Li, Guangyao, Qu, Liao, Qi, Yanyu, Zhou, Jinxing, Xing, Bowei, Yue, Wenzhen, Shi, Ji, Wang, Qixun, Zhang, Peiliang, Liang, Buwen

In this paper, we propose a new multi-modal task, termed audio-visual instance segmentation (AVIS), which aims to simultaneously identify, segment and track individual sounding object instances in audible videos. To facilitate this research, we intro

Externí odkaz: http://arxiv.org/abs/2310.18709

Zobrazit plný text záznamu

Report

CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing

Autor: Chen, Yaru, Guo, Ruohao, Liu, Xubo, Wu, Peipei, Li, Guangyao, Li, Zhenbo, Wang, Wenwu

Audio-visual video parsing is the task of categorizing a video at the segment level with weak labels, and predicting them as audible or visible events. Recent methods for this task leverage the attention mechanism to capture the semantic correlations

Externí odkaz: http://arxiv.org/abs/2310.07517

Zobrazit plný text záznamu

Report

Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer

Autor: Wang, Yaoting, Liu, Weisong, Li, Guangyao, Ding, Jian, Hu, Di, Li, Xi

Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio? In this work, we concentrate on the Audio-Visual Localization and Segmentation tasks but under the deman

Externí odkaz: http://arxiv.org/abs/2309.07929

Zobrazit plný text záznamu

Report

Progressive Spatio-temporal Perception for Audio-Visual Question Answering

Autor: Li, Guangyao, Hou, Wenxuan, Hu, Di

Audio-Visual Question Answering (AVQA) task aims to answer questions about different visual objects, sounds, and their associations in videos. Such naturally multi-modal videos are composed of rich and complex dynamic audio-visual components, where m

Externí odkaz: http://arxiv.org/abs/2308.05421

Zobrazit plný text záznamu

Report

Towards Long Form Audio-visual Video Understanding

Autor: Hou, Wenxuan, Li, Guangyao, Tian, Yapeng, Hu, Di

We live in a world filled with never-ending streams of multimodal information. As a more natural recording of the real scenario, long form audio-visual videos are expected as an important bridge for better exploring and understanding the world. In th

Externí odkaz: http://arxiv.org/abs/2306.09431

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání