Zobrazeno 1 - 10
of 51
pro vyhledávání: '"Hui, Tianrui"'
Autor:
Li, Hongyu, Hui, Tianrui, Ding, Zihan, Zhang, Jing, Ma, Bin, Wei, Xiaoming, Han, Jizhong, Liu, Si
Panoptic narrative grounding (PNG), whose core target is fine-grained image-text alignment, requires a panoptic segmentation of referred objects given a narrative caption. Previous discriminative methods achieve only weak or coarse-grained alignment
Externí odkaz:
http://arxiv.org/abs/2409.08251
Autor:
Huang, Shaofei, Ling, Rui, Li, Hongyu, Hui, Tianrui, Tang, Zongheng, Wei, Xiaoming, Han, Jizhong, Liu, Si
In this paper, we propose an Audio-Language-Referenced SAM 2 (AL-Ref-SAM 2) pipeline to explore the training-free paradigm for audio and language-referenced video object segmentation, namely AVS and RVOS tasks. The intuitive solution leverages Ground
Externí odkaz:
http://arxiv.org/abs/2408.15876
Autor:
He, Runze, Huang, Shaofei, Nie, Xuecheng, Hui, Tianrui, Liu, Luoqi, Dai, Jiao, Han, Jizhong, Li, Guanbin, Liu, Si
In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt. However, obtaining desired editing results conformed with the editin
Externí odkaz:
http://arxiv.org/abs/2312.01663
Autor:
Hui, Tianrui, Ding, Zihan, Huang, Junshi, Wei, Xiaoming, Wei, Xiaolin, Dai, Jiao, Han, Jizhong, Liu, Si
Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption. As a multimodal task, an essential aspect of PNG is the visual-linguistic interaction between image and caption.
Externí odkaz:
http://arxiv.org/abs/2311.01091
As one of the fundamental functions of autonomous driving system, freespace detection aims at classifying each pixel of the image captured by the camera as drivable or non-drivable. Current works of freespace detection heavily rely on large amount of
Externí odkaz:
http://arxiv.org/abs/2210.02991
Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual objects of things and stuff categories described by dense narrative captions of a still image. The previous two-stage approach first extracts segmentation region p
Externí odkaz:
http://arxiv.org/abs/2208.05647
Referring video object segmentation aims to predict foreground labels for objects referred by natural language expressions in videos. Previous methods either depend on 3D ConvNets or incorporate additional 2D ConvNets as encoders to extract mixed spa
Externí odkaz:
http://arxiv.org/abs/2206.03789
Autor:
Wang, Jinsheng, Ma, Yinchao, Huang, Shaofei, Hui, Tianrui, Wang, Fei, Qian, Chen, Zhang, Tianzhu
Lane detection is a challenging task that requires predicting complex topology shapes of lane lines and distinguishing different types of lanes simultaneously. Earlier works follow a top-down roadmap to regress predefined anchors into various shapes
Externí odkaz:
http://arxiv.org/abs/2204.07335
Autor:
Huang, Shaofei, Hui, Tianrui, Gong, Yue, Peng, Fengguang, Fang, Yuqiang, Wang, Jingwei, Ma, Bin, Wei, Xiaoming, Han, Jizhong
Publikováno v:
In Computer Vision and Image Understanding October 2024 247
Recently proposed fine-grained 3D visual grounding is an essential and challenging task, whose goal is to identify the 3D object referred by a natural language sentence from other distractive objects of the same category. Existing works usually adopt
Externí odkaz:
http://arxiv.org/abs/2108.02388