Výsledky vyhledávání - "Wu, Gangshan"

Report

GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation

Autor: Wang, Haonan, Liu, Jie, Tang, Jie, Wu, Gangshan, Xu, Bo, Chou, Yanbing, Wang, Yong

In recent years, 2D human pose estimation has made significant progress on public benchmarks. However, many of these approaches face challenges of less applicability in the industrial community due to the large number of parametric quantities and com

Externí odkaz: http://arxiv.org/abs/2407.10756

Zobrazit plný text záznamu

Report

AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

Autor: Zhu, Yuhan, Ji, Yuyang, Zhao, Zhiyu, Wu, Gangshan, Wang, Limin

Pre-trained vision-language models (VLMs) have shown impressive results in various visual classification tasks. However, we often fail to fully unleash their potential when adapting them for new concept understanding due to limited information on new

Externí odkaz: http://arxiv.org/abs/2407.04603

Zobrazit plný text záznamu

Report

Open-Vocabulary Spatio-Temporal Action Detection

Autor: Wu, Tao, Ge, Shuqiu, Qin, Jie, Wu, Gangshan, Wang, Limin

Spatio-temporal action detection (STAD) is an important fine-grained video understanding task. Current methods require box and label supervision for all action classes in advance. However, in real-world applications, it is very likely to come across

Externí odkaz: http://arxiv.org/abs/2405.10832

Zobrazit plný text záznamu

Report

STMixer: A One-Stage Sparse Action Detector

Autor: Wu, Tao, Cao, Mengqi, Gao, Ziteng, Wu, Gangshan, Wang, Limin

Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm r

Externí odkaz: http://arxiv.org/abs/2404.09842

Zobrazit plný text záznamu

Report

SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

Autor: Wu, Tao, He, Runyu, Wu, Gangshan, Wang, Limin

Video-based visual relation detection tasks, such as video scene graph generation, play important roles in fine-grained video understanding. However, current video visual relation detection datasets have two main limitations that hinder the progress

Externí odkaz: http://arxiv.org/abs/2404.04565

Zobrazit plný text záznamu

Report

Dual DETRs for Multi-Label Temporal Action Detection

Autor: Zhu, Yuhan, Zhang, Guozhen, Tan, Jing, Wu, Gangshan, Wang, Limin

Temporal Action Detection (TAD) aims to identify the action boundaries and the corresponding category within untrimmed videos. Inspired by the success of DETR in object detection, several methods have adapted the query-based framework to the TAD task

Externí odkaz: http://arxiv.org/abs/2404.00653

Zobrazit plný text záznamu

Report

Spatiotemporal Predictive Pre-training for Robotic Motor Control

Autor: Yang, Jiange, Liu, Bei, Fu, Jianlong, Pan, Bocheng, Wu, Gangshan, Wang, Limin

Robotic motor control necessitates the ability to predict the dynamics of environments and interaction objects. However, advanced self-supervised pre-trained visual representations (PVRs) in robotic motor control, leveraging large-scale egocentric vi

Externí odkaz: http://arxiv.org/abs/2403.05304

Zobrazit plný text záznamu

Report

Sketch and Refine: Towards Fast and Accurate Lane Detection

Autor: Chen, Chao, Liu, Jie, Zhou, Chang, Tang, Jie, Wu, Gangshan

Lane detection is to determine the precise location and shape of lanes on the road. Despite efforts made by current methods, it remains a challenging task due to the complexity of real-world scenarios. Existing approaches, whether proposal-based or k

Externí odkaz: http://arxiv.org/abs/2401.14729

Zobrazit plný text záznamu

Report

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

Autor: Zhao, Zhiyu, Huang, Bingkun, Xing, Sen, Wu, Gangshan, Qiao, Yu, Wang, Limin

Self-supervised foundation models have shown great potential in computer vision thanks to the pre-training paradigm of masked autoencoding. Scale is a primary factor influencing the performance of these foundation models. However, these large foundat

Externí odkaz: http://arxiv.org/abs/2311.03149

Zobrazit plný text záznamu

Report

Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation

Autor: Zhang, Jiaming, Cui, Yutao, Wu, Gangshan, Wang, Limin

Current prevailing Video Object Segmentation (VOS) methods usually perform dense matching between the current and reference frames after extracting their features. One on hand, the decoupled modeling restricts the targets information propagation only

Externí odkaz: http://arxiv.org/abs/2308.13505

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání