Zobrazeno 1 - 10
of 217
pro vyhledávání: '"Yuan, Haobo"'
The recent surge in Multimodal Large Language Models (MLLMs) has showcased their remarkable potential for achieving generalized intelligence by integrating visual understanding into Large Language Models.Nevertheless, the sheer model size of MLLMs le
Externí odkaz:
http://arxiv.org/abs/2407.19409
Autor:
Zhang, Tao, Li, Xiangtai, Fei, Hao, Yuan, Haobo, Wu, Shengqiong, Ji, Shunping, Loy, Chen Change, Yan, Shuicheng
Current universal segmentation methods demonstrate strong capabilities in pixel-level image and video understanding. However, they lack reasoning abilities and cannot be controlled via text instructions. In contrast, large vision-language multimodal
Externí odkaz:
http://arxiv.org/abs/2406.19389
Autor:
Yuan, Haobo, Li, Xiangtai, Qi, Lu, Zhang, Tao, Yang, Ming-Hsuan, Yan, Shuicheng, Loy, Chen Change
Transformer-based segmentation methods face the challenge of efficient inference when dealing with high-resolution images. Recently, several linear attention architectures, such as Mamba and RWKV, have attracted much attention as they can process lon
Externí odkaz:
http://arxiv.org/abs/2406.19369
Autor:
Zhang, Tao, Yuan, Haobo, Qi, Lu, Zhang, Jiangning, Zhou, Qianyu, Ji, Shunping, Yan, Shuicheng, Li, Xiangtai
Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture to more efficiently and effectively model point cloud
Externí odkaz:
http://arxiv.org/abs/2403.00762
Autor:
Li, Xiangtai, Yuan, Haobo, Li, Wei, Ding, Henghui, Wu, Size, Zhang, Wenwei, Li, Yining, Chen, Kai, Loy, Chen Change
In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including ima
Externí odkaz:
http://arxiv.org/abs/2401.10229
The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is renowned for its zero-shot recognition capabilities. This paper presents an in-depth ex
Externí odkaz:
http://arxiv.org/abs/2401.02955
Autor:
Yang, Yibo, Yuan, Haobo, Li, Xiangtai, Wu, Jianlong, Zhang, Lefei, Lin, Zhouchen, Torr, Philip, Tao, Dacheng, Ghanem, Bernard
How to enable learnability for new classes while keeping the capability well on old classes has been a crucial challenge for class incremental learning. Beyond the normal case, long-tail class incremental learning and few-shot class incremental learn
Externí odkaz:
http://arxiv.org/abs/2308.01746
Autor:
Wu, Jianzong, Li, Xiangtai, Xu, Shilin, Yuan, Haobo, Ding, Henghui, Yang, Yibo, Li, Xia, Zhang, Jiangning, Tong, Yunhai, Jiang, Xudong, Ghanem, Bernard, Tao, Dacheng
In the field of visual scene understanding, deep neural networks have made impressive advancements in various core tasks like segmentation, tracking, and detection. However, most approaches operate on the close-set assumption, meaning that the model
Externí odkaz:
http://arxiv.org/abs/2306.15880
Autor:
Li, Xiangtai, Ding, Henghui, Yuan, Haobo, Zhang, Wenwei, Pang, Jiangmiao, Cheng, Guangliang, Chen, Kai, Liu, Ziwei, Loy, Chen Change
Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over t
Externí odkaz:
http://arxiv.org/abs/2304.09854
Autor:
Li, Xiangtai, Yuan, Haobo, Zhang, Wenwei, Cheng, Guangliang, Pang, Jiangmiao, Loy, Chen Change
Video segmentation aims to segment and track every pixel in diverse scenarios accurately. In this paper, we present Tube-Link, a versatile framework that addresses multiple core tasks of video segmentation with a unified architecture. Our framework i
Externí odkaz:
http://arxiv.org/abs/2303.12782