Zobrazeno 1 - 10
of 235
pro vyhledávání: '"Wu, Gangshan"'
In recent years, 2D human pose estimation has made significant progress on public benchmarks. However, many of these approaches face challenges of less applicability in the industrial community due to the large number of parametric quantities and com
Externí odkaz:
http://arxiv.org/abs/2407.10756
Pre-trained vision-language models (VLMs) have shown impressive results in various visual classification tasks. However, we often fail to fully unleash their potential when adapting them for new concept understanding due to limited information on new
Externí odkaz:
http://arxiv.org/abs/2407.04603
Spatio-temporal action detection (STAD) is an important fine-grained video understanding task. Current methods require box and label supervision for all action classes in advance. However, in real-world applications, it is very likely to come across
Externí odkaz:
http://arxiv.org/abs/2405.10832
Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm r
Externí odkaz:
http://arxiv.org/abs/2404.09842
Video-based visual relation detection tasks, such as video scene graph generation, play important roles in fine-grained video understanding. However, current video visual relation detection datasets have two main limitations that hinder the progress
Externí odkaz:
http://arxiv.org/abs/2404.04565
Temporal Action Detection (TAD) aims to identify the action boundaries and the corresponding category within untrimmed videos. Inspired by the success of DETR in object detection, several methods have adapted the query-based framework to the TAD task
Externí odkaz:
http://arxiv.org/abs/2404.00653
Robotic motor control necessitates the ability to predict the dynamics of environments and interaction objects. However, advanced self-supervised pre-trained visual representations (PVRs) in robotic motor control, leveraging large-scale egocentric vi
Externí odkaz:
http://arxiv.org/abs/2403.05304
Lane detection is to determine the precise location and shape of lanes on the road. Despite efforts made by current methods, it remains a challenging task due to the complexity of real-world scenarios. Existing approaches, whether proposal-based or k
Externí odkaz:
http://arxiv.org/abs/2401.14729
Self-supervised foundation models have shown great potential in computer vision thanks to the pre-training paradigm of masked autoencoding. Scale is a primary factor influencing the performance of these foundation models. However, these large foundat
Externí odkaz:
http://arxiv.org/abs/2311.03149
Current prevailing Video Object Segmentation (VOS) methods usually perform dense matching between the current and reference frames after extracting their features. One on hand, the decoupled modeling restricts the targets information propagation only
Externí odkaz:
http://arxiv.org/abs/2308.13505