Zobrazeno 1 - 10
of 1 869
pro vyhledávání: '"Cheng, De"'
Prompt learning has become a prevalent strategy for adapting vision-language foundation models (VLMs) such as CLIP to downstream tasks. With the emergence of large language models (LLMs), recent studies have explored the potential of using category-r
Externí odkaz:
http://arxiv.org/abs/2408.14812
Autor:
Zhang, Shizhou, Luo, Wenlong, Cheng, De, Yang, Qingchun, Ran, Lingyan, Xing, Yinghui, Zhang, Yanning
In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is
Externí odkaz:
http://arxiv.org/abs/2408.07500
Video temporal grounding is an emerging topic aiming to identify specific clips within videos. In addition to pre-trained video models, contemporary methods utilize pre-trained vision-language models (VLM) to capture detailed characteristics of diver
Externí odkaz:
http://arxiv.org/abs/2408.06622
To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training wit
Externí odkaz:
http://arxiv.org/abs/2406.08830
Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models. On the contrary, this paper aims to learn each task by tuning the prompt
Externí odkaz:
http://arxiv.org/abs/2406.05658
In real-world applications, image degeneration caused by adverse weather is always complex and changes with different weather conditions from days and seasons. Systems in real-world environments constantly encounter adverse weather conditions that ar
Externí odkaz:
http://arxiv.org/abs/2403.07292
Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to retrieve pedestrian images of the same identity from different modalities without annotations. While prior work focuses on establishing cross-modality pseudo-label associati
Externí odkaz:
http://arxiv.org/abs/2402.00672
Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhanc
Externí odkaz:
http://arxiv.org/abs/2312.06323
Early weakly supervised video grounding (WSVG) methods often struggle with incomplete boundary detection due to the absence of temporal boundary annotations. To bridge the gap between video-level and boundary-level annotation, explicit-supervision me
Externí odkaz:
http://arxiv.org/abs/2312.02483
Autor:
Zhang, Shizhou, Yang, Qingchun, Cheng, De, Xing, Yinghui, Liang, Guoqiang, Wang, Peng, Zhang, Yanning
In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding boxes for 2,644 identities appearing in both of the UAVs and ground surveillance cameras. To
Externí odkaz:
http://arxiv.org/abs/2308.12712