Výsledky vyhledávání

Report

General Compression Framework for Efficient Transformer Object Tracking

Autor: Hong, Lingyi, Li, Jinglun, Zhou, Xinyu, Yan, Shilin, Guo, Pinxue, Jiang, Kaixun, Chen, Zhaoyu, Gao, Shuyong, Zhang, Wei, Lu, Hong, Zhang, Wenqiang

Transformer-based trackers have established a dominant role in the field of visual object tracking. While these trackers exhibit promising performance, their deployment on resource-constrained devices remains challenging due to inefficiencies. To imp

Externí odkaz: http://arxiv.org/abs/2409.17564

Zobrazit plný text záznamu

Report

VISA: Reasoning Video Object Segmentation via Large Language Models

Autor: Yan, Cilin, Wang, Haochen, Yan, Shilin, Jiang, Xiaolong, Hu, Yao, Kang, Guoliang, Xie, Weidi, Gavves, Efstratios

Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as categories, masks, or short phrases, restricting their ability to perform complex video segmentation requiring reasoning with world knowledge. In this paper, we in

Externí odkaz: http://arxiv.org/abs/2407.11325

Zobrazit plný text záznamu

Report

A Sanity Check for AI-generated Image Detection

Autor: Yan, Shilin, Li, Ouxiang, Cai, Jiayin, Hao, Yanbin, Jiang, Xiaolong, Hu, Yao, Xie, Weidi

With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been s

Externí odkaz: http://arxiv.org/abs/2406.19435

Zobrazit plný text záznamu

Report

Visual Perception by Large Language Model's Weights

Autor: Ma, Feipeng, Xue, Hongwei, Wang, Guangting, Zhou, Yizhou, Rao, Fengyun, Yan, Shilin, Zhang, Yueyi, Wu, Siying, Shou, Mike Zheng, Sun, Xiaoyan

Existing Multimodal Large Language Models (MLLMs) follow the paradigm that perceives visual information by aligning visual features with the input space of Large Language Models (LLMs), and concatenating visual tokens with text tokens to form a unifi

Externí odkaz: http://arxiv.org/abs/2405.20339

Zobrazit plný text záznamu

Report

Multi-Modal Generative Embedding Model

Autor: Ma, Feipeng, Xue, Hongwei, Wang, Guangting, Zhou, Yizhou, Rao, Fengyun, Yan, Shilin, Zhang, Yueyi, Wu, Siying, Shou, Mike Zheng, Sun, Xiaoyan

Most multi-modal tasks can be formulated into problems of either generation or embedding. Existing models usually tackle these two types of problems by decoupling language modules into a text decoder for generation, and a text encoder for embedding.

Externí odkaz: http://arxiv.org/abs/2405.19333

Zobrazit plný text záznamu

Report

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

Autor: Hong, Lingyi, Yan, Shilin, Zhang, Renrui, Li, Wanyun, Zhou, Xinyu, Guo, Pinxue, Jiang, Kaixun, Chen, Yiting, Li, Jinglun, Chen, Zhaoyu, Zhang, Wenqiang

Visual object tracking aims to localize the target object of each frame based on its initial appearance in the first frame. Depending on the input modility, tracking tasks can be divided into RGB tracking and RGB+X (e.g. RGB+N, and RGB+D) tracking. D

Externí odkaz: http://arxiv.org/abs/2403.09634

Zobrazit plný text záznamu

Report

PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation

Autor: Yan, Shilin, Xu, Xiaohao, Zhang, Renrui, Hong, Lingyi, Chen, Wenchao, Zhang, Wenqiang, Zhang, Wei

Panoramic videos contain richer spatial information and have attracted tremendous amounts of attention due to their exceptional experience in some fields such as autonomous driving and virtual reality. However, existing datasets for video segmentatio

Externí odkaz: http://arxiv.org/abs/2309.12303

Zobrazit plný text záznamu

Report

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Autor: Yan, Shilin, Zhang, Renrui, Guo, Ziyu, Chen, Wenchao, Zhang, Wei, Li, Hongyang, Qiao, Yu, Dong, Hao, He, Zhongjiang, Gao, Peng

Recently, video object segmentation (VOS) referred by multi-modal signals, e.g., language and audio, has evoked increasing attention in both industry and academia. It is challenging for exploring the semantic alignment within modalities and the visua

Externí odkaz: http://arxiv.org/abs/2305.16318

Zobrazit plný text záznamu

Report

Personalize Segment Anything Model with One Shot

Autor: Zhang, Renrui, Jiang, Zhengkai, Guo, Ziyu, Yan, Shilin, Pan, Junting, Ma, Xianzheng, Dong, Hao, Gao, Peng, Li, Hongsheng

Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models. Despite the generality, customizing SAM for specific visual concepts without man-po

Externí odkaz: http://arxiv.org/abs/2305.03048

Zobrazit plný text záznamu

Report

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

Autor: Zhang, Renrui, Han, Jiaming, Liu, Chris, Gao, Peng, Zhou, Aojun, Hu, Xiangfei, Yan, Shilin, Lu, Pan, Li, Hongsheng, Qiao, Yu

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model

Externí odkaz: http://arxiv.org/abs/2303.16199

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání