Zobrazeno 1 - 10
of 35
pro vyhledávání: '"Zhou, Chunluan"'
Falling objects from buildings can cause severe injuries to pedestrians due to the great impact force they exert. Although surveillance cameras are installed around some buildings, it is challenging for humans to capture such events in surveillance v
Externí odkaz:
http://arxiv.org/abs/2408.05750
We present a Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards effective and efficient zero-shot video-text retrieval, dubbed M2-RAAP. Upon popular image-text models like CLIP, most current adaptation-based video-text pre-trainin
Externí odkaz:
http://arxiv.org/abs/2401.17797
Autor:
Zhai, Yuanhao, Liu, Ziyi, Wu, Zhenyu, Wu, Yi, Zhou, Chunluan, Doermann, David, Yuan, Junsong, Hua, Gang
Deep learning models have a risk of utilizing spurious clues to make predictions, such as recognizing actions based on the background scene. This issue can severely degrade the open-set action recognition performance when the testing samples have dif
Externí odkaz:
http://arxiv.org/abs/2309.01265
Motivated by the superior performance of image diffusion models, more and more researchers strive to extend these models to the text-based video editing task. Nevertheless, current video editing tasks mainly suffer from the dilemma between the high f
Externí odkaz:
http://arxiv.org/abs/2308.10648
Compared with previous two-stream trackers, the recent one-stream tracking pipeline, which allows earlier interaction between the template and search region, has achieved a remarkable performance gain. However, existing one-stream trackers always let
Externí odkaz:
http://arxiv.org/abs/2303.16580
Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. However, the independent correlation computation in the attention mechanism could result in noisy and ambiguous attention weig
Externí odkaz:
http://arxiv.org/abs/2207.09603
Knowledge distillation is widely adopted in semantic segmentation to reduce the computation cost.The previous knowledge distillation methods for semantic segmentation focus on pixel-wise feature alignment and intra-class feature variation distillatio
Externí odkaz:
http://arxiv.org/abs/2205.03650
Multi-person pose estimation and tracking serve as crucial steps for video understanding. Most state-of-the-art approaches rely on first estimating poses in each frame and only then implementing data association and refinement. Despite the promising
Externí odkaz:
http://arxiv.org/abs/2106.03772
In this paper, we study the actor-action semantic segmentation problem, which requires joint labeling of both actor and action categories in video frames. One major challenge for this task is that when an actor performs an action, different body part
Externí odkaz:
http://arxiv.org/abs/1807.08430
Occlusions, complex backgrounds, scale variations and non-uniform distributions present great challenges for crowd counting in practical applications. In this paper, we propose a novel method using an attention model to exploit head locations which a
Externí odkaz:
http://arxiv.org/abs/1806.10287