Zobrazeno 1 - 10
of 2 190
pro vyhledávání: '"Zhang Xiaoqin"'
We rethink the segment anything model (SAM) and propose a novel multiprompt network called COMPrompter for camouflaged object detection (COD). SAM has zero-shot generalization ability beyond other models and can provide an ideal framework for COD. Ou
Externí odkaz:
http://arxiv.org/abs/2411.18858
Test-time prompt tuning, which learns prompts online with unlabelled test samples during the inference stage, has demonstrated great potential by learning effective prompts on-the-fly without requiring any task-specific annotations. However, its perf
Externí odkaz:
http://arxiv.org/abs/2410.20346
Hallucination, a phenomenon where multimodal large language models~(MLLMs) tend to generate textual responses that are plausible but unaligned with the image, has become one major hurdle in various MLLM-related applications. Several benchmarks have b
Externí odkaz:
http://arxiv.org/abs/2410.09962
This paper introduces a new Segment Anything Model with Depth Perception (DSAM) for Camouflaged Object Detection (COD). DSAM exploits the zero-shot capability of SAM to realize precise segmentation in the RGB-D domain. It consists of the Prompt-Deepe
Externí odkaz:
http://arxiv.org/abs/2407.12339
Pre-training has emerged as a simple yet powerful methodology for representation learning across various domains. However, due to the expensive training cost and limited data, pre-training has not yet been extensively studied in correspondence prunin
Externí odkaz:
http://arxiv.org/abs/2406.05773
Video Object Segmentation (VOS) aims to track objects across frames in a video and segment them based on the initial annotated frame of the target objects. Previous VOS works typically rely on fully annotated videos for training. However, acquiring f
Externí odkaz:
http://arxiv.org/abs/2405.14010
Monocular 3D object detection aims for precise 3D localization and identification of objects from a single-view image. Despite its recent progress, it often struggles while handling pervasive object occlusions that tend to complicate and degrade the
Externí odkaz:
http://arxiv.org/abs/2405.07696
Autor:
Gao, Jin, Lin, Shubo, Wang, Shaoru, Kou, Yutong, Li, Zeming, Li, Liang, Zhang, Congxuan, Zhang, Xiaoqin, Wang, Yizheng, Hu, Weiming
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the \textit{extremely simple} lightweig
Externí odkaz:
http://arxiv.org/abs/2404.12210
Inspired by the success of general-purpose models in NLP, recent studies attempt to unify different vision tasks in the same sequence format and employ autoregressive Transformers for sequence prediction. They apply uni-directional attention to captu
Externí odkaz:
http://arxiv.org/abs/2403.07692
Monocular 3D detection (M3D) aims for precise 3D object localization from a single-view image which usually involves labor-intensive annotation of 3D detection boxes. Weakly supervised M3D has recently been studied to obviate the 3D annotation proces
Externí odkaz:
http://arxiv.org/abs/2402.19144