Zobrazeno 1 - 10
of 79
pro vyhledávání: '"Shao, Yuanjie"'
Open-vocabulary semantic segmentation is a challenging task, which requires the model to output semantic masks of an image beyond a close-set vocabulary. Although many efforts have been made to utilize powerful CLIP models to accomplish this task, th
Externí odkaz:
http://arxiv.org/abs/2406.09829
Object detection as a subfield within computer vision has achieved remarkable progress, which aims to accurately identify and locate a specific object from images or videos. Such methods rely on large-scale labeled training samples for each object ca
Externí odkaz:
http://arxiv.org/abs/2404.04799
Cross-domain few-shot classification (CD-FSC) aims to identify novel target classes with a few samples, assuming that there exists a domain shift between source and target domains. Existing state-of-the-art practices typically pre-train on source dom
Externí odkaz:
http://arxiv.org/abs/2308.00727
This technical report presents our first place winning solution for temporal action detection task in CVPR-2022 AcitivityNet Challenge. The task aims to localize temporal boundaries of action instances with specific classes in long untrimmed videos.
Externí odkaz:
http://arxiv.org/abs/2206.09082
Publikováno v:
In Pattern Recognition July 2024 151
Autor:
Wu, Yuhang, Huang, Tengteng, Yao, Haotian, Zhang, Chi, Shao, Yuanjie, Han, Chuchu, Gao, Changxin, Sang, Nong
Recently, many approaches tackle the Unsupervised Domain Adaptive person re-identification (UDA re-ID) problem through pseudo-label-based contrastive learning. During training, a uni-centroid representation is obtained by simply averaging all the ins
Externí odkaz:
http://arxiv.org/abs/2112.11689
Deep learning-based methods for low-light image enhancement typically require enormous paired training data, which are impractical to capture in real-world scenarios. Recently, unsupervised approaches have been explored to eliminate the reliance on p
Externí odkaz:
http://arxiv.org/abs/2112.01766
The fully convolutional network (FCN) has achieved tremendous success in dense visual recognition tasks, such as scene segmentation. The last layer of FCN is typically a global classifier (1x1 convolution) to recognize each pixel to a semantic label.
Externí odkaz:
http://arxiv.org/abs/2109.10322
Autor:
Wang, Xiang, Zhang, Shiwei, Qing, Zhiwu, Shao, Yuanjie, Zuo, Zhengrong, Gao, Changxin, Sang, Nong
Most recent approaches for online action detection tend to apply Recurrent Neural Network (RNN) to capture long-range temporal structure. However, RNN suffers from non-parallelism and gradient vanishing, hence it is hard to be optimized. In this pape
Externí odkaz:
http://arxiv.org/abs/2106.11149
Autor:
Wang, Xiang, Qing, Zhiwu, Huang, Ziyuan, Feng, Yutong, Zhang, Shiwei, Jiang, Jianwen, Tang, Mingqian, Shao, Yuanjie, Sang, Nong
Publikováno v:
CVPRW-2021
Weakly-Supervised Temporal Action Localization (WS-TAL) task aims to recognize and localize temporal starts and ends of action instances in an untrimmed video with only video-level label supervision. Due to lack of negative samples of background cate
Externí odkaz:
http://arxiv.org/abs/2106.11811