Výsledky vyhledávání - "Wang, Shuhui"

Report

Prompting Video-Language Foundation Models with Domain-specific Fine-grained Heuristics for Video Question Answering

Autor: Yu, Ting, Fu, Kunhao, Wang, Shuhui, Huang, Qingming, Yu, Jun

Publikováno v: IEEE Transactions on Circuits and Systems for Video Technology, 2024

Video Question Answering (VideoQA) represents a crucial intersection between video understanding and language processing, requiring both discriminative unimodal comprehension and sophisticated cross-modal interaction for accurate inference. Despite a

Externí odkaz: http://arxiv.org/abs/2410.09380

Zobrazit plný text záznamu

Report

Scalable Graph Compressed Convolutions

Autor: Sun, Junshu, Yang, Chenxue, Wang, Shuhui, Huang, Qingming

Designing effective graph neural networks (GNNs) with message passing has two fundamental challenges, i.e., determining optimal message-passing pathways and designing local aggregators. Previous methods of designing optimal pathways are limited with

Externí odkaz: http://arxiv.org/abs/2407.18480

Zobrazit plný text záznamu

Report

MatrixGate: A High-performance Data Ingestion Tool for Time-series Databases

Autor: Wang, Shuhui, Sun, Zihan, Hu, Chaochen, Li, Chao, Zhang, Yong, Yao, Yandong, Wang, Hao, Xing, Chunxiao

Recent years have seen massive time-series data generated in many areas. This different scenario brings new challenges, particularly in terms of data ingestion, where existing technologies struggle to handle such massive time-series data, leading to

Externí odkaz: http://arxiv.org/abs/2406.05462

Zobrazit plný text záznamu

Report

Uncertainty-boosted Robust Video Activity Anticipation

Autor: Qi, Zhaobo, Wang, Shuhui, Zhang, Weigang, Huang, Qingming

Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision and autonomous driving. Despite the recent progress, the data uncertainty issue, reflected as the content evo

Externí odkaz: http://arxiv.org/abs/2404.18648

Zobrazit plný text záznamu

Report

Confusing Pair Correction Based on Category Prototype for Domain Adaptation under Noisy Environments

Autor: Zhi, Churan, Zhuo, Junbao, Wang, Shuhui

In this paper, we address unsupervised domain adaptation under noisy environments, which is more challenging and practical than traditional domain adaptation. In this scenario, the model is prone to overfitting noisy labels, resulting in a more prono

Externí odkaz: http://arxiv.org/abs/2403.12883

Zobrazit plný text záznamu

Report

A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes

Autor: Yu, Ting, Lin, Xiaojun, Wang, Shuhui, Sheng, Weiguo, Huang, Qingming, Yu, Jun

Three-Dimensional (3D) dense captioning is an emerging vision-language bridging task that aims to generate multiple detailed and accurate descriptions for 3D scenes. It presents significant potential and challenges due to its closer representation of

Externí odkaz: http://arxiv.org/abs/2403.07469

Zobrazit plný text záznamu

Report

Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video

Autor: Qi, Zhaobo, Yuan, Yibo, Ruan, Xiaowen, Wang, Shuhui, Zhang, Weigang, Huang, Qingming

Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias issue, which is caused by the uneven temporal distribution of the target moments for samples with similar semantic components in input videos or query texts. Existing methods res

Externí odkaz: http://arxiv.org/abs/2401.07567

Zobrazit plný text záznamu

Report

R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

Autor: Xiao, Jiayu, Li, Liang, Lv, Henglei, Wang, Shuhui, Huang, Qingming

Recent text-to-image (T2I) diffusion models have achieved remarkable progress in generating high-quality images given text-prompts as input. However, these models fail to convey appropriate spatial composition specified by a layout instruction. In th

Externí odkaz: http://arxiv.org/abs/2310.08872

Zobrazit plný text záznamu

Report

Open-Set Knowledge-Based Visual Question Answering with Inference Paths

Autor: Gan, Jingru, Han, Xinzhe, Wang, Shuhui, Huang, Qingming

Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. Prior KB-VQA models are usually formulated a

Externí odkaz: http://arxiv.org/abs/2310.08148

Zobrazit plný text záznamu

Report

Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval

Autor: Wang, Yabing, Wang, Shuhui, Luo, Hao, Dong, Jianfeng, Wang, Fan, Han, Meng, Wang, Xun, Wang, Meng

Current research on cross-modal retrieval is mostly English-oriented, as the availability of a large number of English-oriented human-labeled vision-language corpora. In order to break the limit of non-English labeled data, cross-lingual cross-modal

Externí odkaz: http://arxiv.org/abs/2309.05451

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání