Výsledky vyhledávání

Report

The Instance-centric Transformer for the RVOS Track of LSVOS Challenge: 3rd Place Solution

Autor: Cao, Bin, Zhang, Yisi, Wang, Hanyi, He, Xingjian, Liu, Jing

Referring Video Object Segmentation is an emerging multi-modal task that aims to segment objects in the video given a natural language expression. In this work, we build two instance-centric models and fuse predicted results from frame-level and inst

Externí odkaz: http://arxiv.org/abs/2408.10541

Zobrazit plný text záznamu

Report

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Seg

Externí odkaz: http://arxiv.org/abs/2406.17005

Zobrazit plný text záznamu

Report

2nd Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

Autor: Cao, Bin, Zhang, Yisi, Lin, Xuanxu, He, Xingjian, Zhao, Bo, Liu, Jing

Motion Expression guided Video Segmentation is a challenging task that aims at segmenting objects in the video based on natural language expressions with motion descriptions. Unlike the previous referring video object segmentation (RVOS), this task f

Externí odkaz: http://arxiv.org/abs/2406.13939

Zobrazit plný text záznamu

Report

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions

Autor: Wang, Wenxuan, Zhang, Yisi, He, Xingjian, Yan, Yichen, Zhao, Zijia, Wang, Xinlong, Liu, Jing

Visual grounding (VG) aims at locating the foreground entities that match the given natural language expressions. Previous datasets and methods for classic VG task mainly rely on the prior assumption that the given expression must literally refer to

Externí odkaz: http://arxiv.org/abs/2402.11265

Zobrazit plný text záznamu

Report

Adaptive FSS: A Novel Few-Shot Segmentation Framework via Prototype Enhancement

Autor: Wang, Jing, Li, Jinagyun, Chen, Chen, Zhang, Yisi, Shen, Haoran, Zhang, Tianxiang

The Few-Shot Segmentation (FSS) aims to accomplish the novel class segmentation task with a few annotated images. Current FSS research based on meta-learning focus on designing a complex interaction mechanism between the query and support feature. Ho

Externí odkaz: http://arxiv.org/abs/2312.15731

Zobrazit plný text záznamu

Report

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

Autor: Wang, Wenxuan, Yue, Tongtian, Zhang, Yisi, Guo, Longteng, He, Xingjian, Wang, Xinlong, Liu, Jing

Referring expression segmentation (RES) aims at segmenting the foreground masks of the entities that match the descriptive natural language expression. Previous datasets and methods for classic RES task heavily rely on the prior assumption that one e

Externí odkaz: http://arxiv.org/abs/2312.08007

Zobrazit plný text záznamu

Report

CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation

Autor: Wang, Wenxuan, Liu, Jing, He, Xingjian, Zhang, Yisi, Chen, Chen, Shen, Jiachen, Zhang, Yan, Li, Jiangyun

Referring image segmentation (RIS) is a fundamental vision-language task that intends to segment a desired object from an image based on a given natural language expression. Due to the essentially distinct data properties between image and text, most

Externí odkaz: http://arxiv.org/abs/2305.11481

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání