Zobrazeno 1 - 10
of 865
pro vyhledávání: '"Zhang, Guozhen"'
Autor:
Zhang, Guozhen, Liu, Jingyu, Cao, Shengming, Zhao, Xiaotong, Zhao, Kevin, Ma, Kai, Wang, Limin
Recently, the remarkable success of pre-trained Vision Transformers (ViTs) from image-text matching has sparked an interest in image-to-video adaptation. However, most current approaches retain the full forward pass for each frame, leading to a high
Externí odkaz:
http://arxiv.org/abs/2408.06840
Autor:
Zhu, Yuhan, Zhang, Guozhen, Xu, Chen, Shen, Haocheng, Chen, Xiaoxin, Wu, Gangshan, Wang, Limin
Vision-language models have showcased impressive zero-shot classification capabilities when equipped with suitable text prompts. Previous studies have shown the effectiveness of test-time prompt tuning; however, these methods typically require per-im
Externí odkaz:
http://arxiv.org/abs/2408.05775
Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields or entail signi
Externí odkaz:
http://arxiv.org/abs/2407.02315
Large motion poses a critical challenge in Video Frame Interpolation (VFI) task. Existing methods are often constrained by limited receptive fields, resulting in sub-optimal performance when handling scenarios with large motion. In this paper, we int
Externí odkaz:
http://arxiv.org/abs/2404.06913
Temporal Action Detection (TAD) aims to identify the action boundaries and the corresponding category within untrimmed videos. Inspired by the success of DETR in object detection, several methods have adapted the query-based framework to the TAD task
Externí odkaz:
http://arxiv.org/abs/2404.00653
Point-based image editing has attracted remarkable attention since the emergence of DragGAN. Recently, DragDiffusion further pushes forward the generative quality via adapting this dragging technique to diffusion models. Despite these great success,
Externí odkaz:
http://arxiv.org/abs/2403.04437
Masked autoencoding has shown excellent performance on self-supervised video representation learning. Temporal redundancy has led to a high masking ratio and customized masking strategy in VideoMAE. In this paper, we aim to further improve the perfor
Externí odkaz:
http://arxiv.org/abs/2308.10794
Autor:
Xu, Chen, Zhu, Yuhan, Zhang, Guozhen, Shen, Haocheng, Liao, Yixuan, Chen, Xiaoxin, Wu, Gangshan, Wang, Limin
Prompt learning has emerged as an efficient and effective approach for transferring foundational Vision-Language Models (e.g., CLIP) to downstream tasks. However, current methods tend to overfit to seen categories, thereby limiting their generalizati
Externí odkaz:
http://arxiv.org/abs/2308.10061
Effectively extracting inter-frame motion and appearance information is important for video frame interpolation (VFI). Previous works either extract both types of information in a mixed way or elaborate separate modules for each type of information,
Externí odkaz:
http://arxiv.org/abs/2303.00440
Autor:
Zhang, Xiaomin, Zhu, Chengfei, Yang, Xiaoli, Ye, Yuanfeng, Zhang, Guozhen, Yu, Feng, Chen, Peng, Zhu, Yong, Kang, Qiannan
Publikováno v:
In International Journal of Biological Macromolecules November 2024 280 Part 3