Zobrazeno 1 - 10
of 42
pro vyhledávání: '"Henghui Ding"'
Publikováno v:
IEEE Transactions on Pattern Analysis and Machine Intelligence. 45:7900-7916
We propose a Vision-Language Transformer (VLT) framework for referring segmentation to facilitate deep interactions among multi-modal information and enhance the holistic understanding to vision-language features. There are different ways to understa
Autor:
Xiangtai Li, Hao He, Yibo Yang, Henghui Ding, Kuiyuan Yang, Guangliang Cheng, Yunhai Tong, Dacheng Tao
Publikováno v:
IEEE Transactions on Pattern Analysis and Machine Intelligence. :1-8
Video Instance Segmentation (VIS) is a new and inherently multi-task problem, which aims to detect, segment, and track each instance in a video sequence. Existing approaches are mainly based on single-frame features or single-scale features of multip
Publikováno v:
IEEE Transactions on Image Processing. 31:2421-2432
Image matting has attracted growing interest in recent years for its wide applications in numerous vision tasks. Most previous image matting methods rely on trimaps as auxiliary input to define the foreground, background and unknown region. However,
Open Vocabulary Instance Segmentation
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::44d64b3cdf531960c97562cde12c6e9a
https://doi.org/10.36227/techrxiv.22082723
https://doi.org/10.36227/techrxiv.22082723
We address the problem of referring image segmentation that aims to generate a mask for the object specified by a natural language expression. Many recent works utilize Transformer to extract features for the target object by aggregating the attended
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b57e87c4dee62a8d12b00ce1d3fdeeef
Publikováno v:
2022 International Joint Conference on Neural Networks (IJCNN).
Publikováno v:
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Publikováno v:
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Numerous advancements in deep learning can be attributed to the access to large-scale and well-annotated datasets. However, such a dataset is prohibitively expensive in 3D computer vision due to the substantial collection cost. To alleviate this issu
Modern GANs excel at generating high quality and diverse images. However, when transferring the pretrained GANs on small target data (e.g., 10-shot), the generator tends to replicate the training samples. Several methods have been proposed to address
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::aacb57881d41f85b2d41b4c38dbef4b4
http://arxiv.org/abs/2205.03805
http://arxiv.org/abs/2205.03805
Referring segmentation aims to generate a segmentation mask for the target instance indicated by a natural language expression. There are typically two kinds of existing methods: one-stage methods that directly perform segmentation on the fused visio
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2ca5d2f67b0bbbceca16aa01ba0b47cf
http://arxiv.org/abs/2204.12109
http://arxiv.org/abs/2204.12109