Zobrazeno 1 - 10
of 254
pro vyhledávání: '"YAN, Shilin"'
Autor:
Hong, Lingyi, Li, Jinglun, Zhou, Xinyu, Yan, Shilin, Guo, Pinxue, Jiang, Kaixun, Chen, Zhaoyu, Gao, Shuyong, Zhang, Wei, Lu, Hong, Zhang, Wenqiang
Transformer-based trackers have established a dominant role in the field of visual object tracking. While these trackers exhibit promising performance, their deployment on resource-constrained devices remains challenging due to inefficiencies. To imp
Externí odkaz:
http://arxiv.org/abs/2409.17564
Autor:
Yan, Cilin, Wang, Haochen, Yan, Shilin, Jiang, Xiaolong, Hu, Yao, Kang, Guoliang, Xie, Weidi, Gavves, Efstratios
Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as categories, masks, or short phrases, restricting their ability to perform complex video segmentation requiring reasoning with world knowledge. In this paper, we in
Externí odkaz:
http://arxiv.org/abs/2407.11325
With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been s
Externí odkaz:
http://arxiv.org/abs/2406.19435
Autor:
Ma, Feipeng, Xue, Hongwei, Wang, Guangting, Zhou, Yizhou, Rao, Fengyun, Yan, Shilin, Zhang, Yueyi, Wu, Siying, Shou, Mike Zheng, Sun, Xiaoyan
Existing Multimodal Large Language Models (MLLMs) follow the paradigm that perceives visual information by aligning visual features with the input space of Large Language Models (LLMs), and concatenating visual tokens with text tokens to form a unifi
Externí odkaz:
http://arxiv.org/abs/2405.20339
Autor:
Ma, Feipeng, Xue, Hongwei, Wang, Guangting, Zhou, Yizhou, Rao, Fengyun, Yan, Shilin, Zhang, Yueyi, Wu, Siying, Shou, Mike Zheng, Sun, Xiaoyan
Most multi-modal tasks can be formulated into problems of either generation or embedding. Existing models usually tackle these two types of problems by decoupling language modules into a text decoder for generation, and a text encoder for embedding.
Externí odkaz:
http://arxiv.org/abs/2405.19333
Autor:
Hong, Lingyi, Yan, Shilin, Zhang, Renrui, Li, Wanyun, Zhou, Xinyu, Guo, Pinxue, Jiang, Kaixun, Chen, Yiting, Li, Jinglun, Chen, Zhaoyu, Zhang, Wenqiang
Visual object tracking aims to localize the target object of each frame based on its initial appearance in the first frame. Depending on the input modility, tracking tasks can be divided into RGB tracking and RGB+X (e.g. RGB+N, and RGB+D) tracking. D
Externí odkaz:
http://arxiv.org/abs/2403.09634
Autor:
Yan, Shilin, Xu, Xiaohao, Zhang, Renrui, Hong, Lingyi, Chen, Wenchao, Zhang, Wenqiang, Zhang, Wei
Panoramic videos contain richer spatial information and have attracted tremendous amounts of attention due to their exceptional experience in some fields such as autonomous driving and virtual reality. However, existing datasets for video segmentatio
Externí odkaz:
http://arxiv.org/abs/2309.12303
Autor:
Yan, Shilin, Zhang, Renrui, Guo, Ziyu, Chen, Wenchao, Zhang, Wei, Li, Hongyang, Qiao, Yu, Dong, Hao, He, Zhongjiang, Gao, Peng
Recently, video object segmentation (VOS) referred by multi-modal signals, e.g., language and audio, has evoked increasing attention in both industry and academia. It is challenging for exploring the semantic alignment within modalities and the visua
Externí odkaz:
http://arxiv.org/abs/2305.16318
Autor:
Zhang, Renrui, Jiang, Zhengkai, Guo, Ziyu, Yan, Shilin, Pan, Junting, Ma, Xianzheng, Dong, Hao, Gao, Peng, Li, Hongsheng
Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models. Despite the generality, customizing SAM for specific visual concepts without man-po
Externí odkaz:
http://arxiv.org/abs/2305.03048
Autor:
Zhang, Renrui, Han, Jiaming, Liu, Chris, Gao, Peng, Zhou, Aojun, Hu, Xiangfei, Yan, Shilin, Lu, Pan, Li, Hongsheng, Qiao, Yu
We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model
Externí odkaz:
http://arxiv.org/abs/2303.16199