Zobrazeno 1 - 10
of 73
pro vyhledávání: '"Pan, LiYuan"'
The challenge in LLM-based video understanding lies in preserving visual and semantic information in long videos while maintaining a memory-affordable token count. However, redundancy and correspondence in videos have hindered the performance potenti
Externí odkaz:
http://arxiv.org/abs/2411.12355
Human action recognition (HAR) plays a key role in various applications such as video analysis, surveillance, autonomous driving, robotics, and healthcare. Most HAR algorithms are developed from RGB images, which capture detailed visual information.
Externí odkaz:
http://arxiv.org/abs/2410.16746
Image dehazing has drawn a significant attention in recent years. Learning-based methods usually require paired hazy and corresponding ground truth (haze-free) images for training. However, it is difficult to collect real-world image pairs, which pre
Externí odkaz:
http://arxiv.org/abs/2410.16095
Fine-grained video action recognition can be conceptualized as a video-text matching problem. Previous approaches often rely on global video semantics to consolidate video embeddings, which can lead to misalignment in video-text pairs due to a lack o
Externí odkaz:
http://arxiv.org/abs/2410.14238
This paper studies zero-shot object recognition using event camera data. Guided by CLIP, which is pre-trained on RGB images, existing approaches achieve zero-shot object recognition by optimizing embedding similarities between event data and RGB imag
Externí odkaz:
http://arxiv.org/abs/2407.21616
All-in-one (AiO) frameworks restore various adverse weather degradations with a single set of networks jointly. To handle various weather conditions, an AiO framework is expected to adaptively learn weather-specific knowledge for different degradatio
Externí odkaz:
http://arxiv.org/abs/2312.01381
This paper introduces a self-supervised learning framework designed for pre-training neural networks tailored to dense prediction tasks using event camera data. Our approach utilizes solely event data for training. Transferring achievements from dens
Externí odkaz:
http://arxiv.org/abs/2311.11533
With the concept of teaching being introduced to the machine learning community, a teacher model start using dynamic loss functions to teach the training of a student model. The dynamic intends to set adaptive loss functions to different phases of st
Externí odkaz:
http://arxiv.org/abs/2310.19313
This paper studies co-segmenting the common semantic object in a set of images. Existing works either rely on carefully engineered networks to mine the implicit semantic information in visual features or require extra data (i.e., classification label
Externí odkaz:
http://arxiv.org/abs/2308.11506
Recovering sharp images from dual-pixel (DP) pairs with disparity-dependent blur is a challenging task.~Existing blur map-based deblurring methods have demonstrated promising results. In this paper, we propose, to the best of our knowledge, the first
Externí odkaz:
http://arxiv.org/abs/2307.09815