Výsledky vyhledávání

Report

HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling

Autor: Wang, Yubin, Jiang, Xinyang, Cheng, De, Sun, Wenli, Li, Dongsheng, Zhao, Cairong

Prompt learning has become a prevalent strategy for adapting vision-language foundation models (VLMs) such as CLIP to downstream tasks. With the emergence of large language models (LLMs), recent studies have explored the potential of using category-r

Externí odkaz: http://arxiv.org/abs/2408.14812

Zobrazit plný text záznamu

Report

Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach

Autor: Zhang, Shizhou, Luo, Wenlong, Cheng, De, Yang, Qingchun, Ran, Lingyan, Xing, Yinghui, Zhang, Yanning

In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is

Externí odkaz: http://arxiv.org/abs/2408.07500

Zobrazit plný text záznamu

Report

ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding

Autor: Wang, Yubin, Jiang, Xinyang, Cheng, De, Li, Dongsheng, Zhao, Cairong

Video temporal grounding is an emerging topic aiming to identify specific clips within videos. In addition to pre-trained video models, contemporary methods utilize pre-trained vision-language models (VLM) to capture detailed characteristics of diver

Externí odkaz: http://arxiv.org/abs/2408.06622

Zobrazit plný text záznamu

Report

Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning

Autor: Zhang, Dingwen, Li, Yan, Cheng, De, Wang, Nannan, Han, Junwei

To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training wit

Externí odkaz: http://arxiv.org/abs/2406.08830

Zobrazit plný text záznamu

Report

Visual Prompt Tuning in Null Space for Continual Learning

Autor: Lu, Yue, Zhang, Shizhou, Cheng, De, Xing, Yinghui, Wang, Nannan, Wang, Peng, Zhang, Yanning

Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models. On the contrary, this paper aims to learn each task by tuning the prompt

Externí odkaz: http://arxiv.org/abs/2406.05658

Zobrazit plný text záznamu

Report

Continual All-in-One Adverse Weather Removal with Knowledge Replay on a Unified Network Structure

Autor: Cheng, De, Ji, Yanling, Gong, Dong, Li, Yan, Wang, Nannan, Han, Junwei, Zhang, Dingwen

In real-world applications, image degeneration caused by adverse weather is always complex and changes with different weather conditions from days and seasons. Systems in real-world environments constantly encounter adverse weather conditions that ar

Externí odkaz: http://arxiv.org/abs/2403.07292

Zobrazit plný text záznamu

Report

Exploring Homogeneous and Heterogeneous Consistent Label Associations for Unsupervised Visible-Infrared Person ReID

Autor: He, Lingfeng, Cheng, De, Wang, Nannan, Gao, Xinbo

Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to retrieve pedestrian images of the same identity from different modalities without annotations. While prior work focuses on establishing cross-modality pseudo-label associati

Externí odkaz: http://arxiv.org/abs/2402.00672

Zobrazit plný text záznamu

Report

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

Autor: Wang, Yubin, Jiang, Xinyang, Cheng, De, Li, Dongsheng, Zhao, Cairong

Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhanc

Externí odkaz: http://arxiv.org/abs/2312.06323

Zobrazit plný text záznamu

Report

EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model

Autor: Li, Guozhang, Ding, Xinpeng, Cheng, De, Li, Jie, Wang, Nannan, Gao, Xinbo

Early weakly supervised video grounding (WSVG) methods often struggle with incomplete boundary detection due to the absence of temporal boundary annotations. To bridge the gap between video-level and boundary-level annotation, explicit-supervision me

Externí odkaz: http://arxiv.org/abs/2312.02483

Zobrazit plný text záznamu

Report

Ground-to-Aerial Person Search: Benchmark Dataset and Approach

Autor: Zhang, Shizhou, Yang, Qingchun, Cheng, De, Xing, Yinghui, Liang, Guoqiang, Wang, Peng, Zhang, Yanning

In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding boxes for 2,644 identities appearing in both of the UAVs and ground surveillance cameras. To

Externí odkaz: http://arxiv.org/abs/2308.12712

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání