Zobrazeno 1 - 10
of 63
pro vyhledávání: '"Zhao, Peisen"'
Autor:
Shi, Bowen, Zhao, Peisen, Wang, Zichen, Zhang, Yuhang, Wang, Yaoming, Li, Jin, Dai, Wenrui, Zou, Junni, Xiong, Hongkai, Tian, Qi, Zhang, Xiaopeng
Vision-language foundation models, represented by Contrastive Language-Image Pre-training (CLIP), have gained increasing attention for jointly understanding both vision and textual tasks. However, existing approaches primarily focus on training model
Externí odkaz:
http://arxiv.org/abs/2401.06397
Transformers have become the primary backbone of the computer vision community due to their impressive performance. However, the unfriendly computation cost impedes their potential in the video recognition domain. To optimize the speed-accuracy trade
Externí odkaz:
http://arxiv.org/abs/2308.04549
Autor:
Ju, Chen, Li, Zeqian, Zhao, Peisen, Zhang, Ya, Zhang, Xiaopeng, Tian, Qi, Wang, Yanfeng, Xie, Weidi
In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not se
Externí odkaz:
http://arxiv.org/abs/2303.11732
Autor:
Ju, Chen, Wang, Haicheng, Liu, Jinxiang, Ma, Chaofan, Zhang, Ya, Zhao, Peisen, Chang, Jianlong, Tian, Qi
Temporal sentence grounding aims to detect the event timestamps described by the natural language query from given untrimmed videos. The existing fully-supervised setting achieves great performance but requires expensive annotation costs; while the w
Externí odkaz:
http://arxiv.org/abs/2302.09850
Autor:
Ju, Chen, Zheng, Kunhao, Liu, Jinxiang, Zhao, Peisen, Zhang, Ya, Chang, Jianlong, Wang, Yanfeng, Tian, Qi
Weakly-supervised temporal action localization (WTAL) learns to detect and classify action instances with only category labels. Most methods widely adopt the off-the-shelf Classification-Based Pre-training (CBP) to generate video features for action
Externí odkaz:
http://arxiv.org/abs/2212.09335
Weakly-supervised temporal action localization aims to localize actions in untrimmed videos with only video-level action category labels. Most of previous methods ignore the incompleteness issue of Class Activation Sequences (CAS), suffering from tri
Externí odkaz:
http://arxiv.org/abs/2104.02357
Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance. Existing methods adopt the frame-level prediction paradigm to learn from the sparse single-frame
Externí odkaz:
http://arxiv.org/abs/2012.08236
This paper explores semi-supervised anomaly detection, a more practical setting for anomaly detection where a small additional set of labeled samples are provided. We propose a new KL-divergence based objective function for semi-supervised anomaly de
Externí odkaz:
http://arxiv.org/abs/2012.04905
Online Action Detection (OAD) in videos is proposed as a per-frame labeling task to address the real-time prediction tasks that can only obtain the previous and current video frames. This paper presents a novel learning-with-privileged based framewor
Externí odkaz:
http://arxiv.org/abs/2011.09158
Video-based action recognition has recently attracted much attention in the field of computer vision. To solve more complex recognition tasks, it has become necessary to distinguish different levels of interclass variations. Inspired by a common flow
Externí odkaz:
http://arxiv.org/abs/2007.06149