Zobrazeno 1 - 10
of 42
pro vyhledávání: '"Tang, Mingqian"'
Autor:
Pei, Yixuan, Qing, Zhiwu, Cen, Jun, Wang, Xiang, Zhang, Shiwei, Wang, Yaxiong, Tang, Mingqian, Sang, Nong, Qian, Xueming
Recent incremental learning for action recognition usually stores representative videos to mitigate catastrophic forgetting. However, only a few bulky videos can be stored due to the limited memory. To address this problem, we propose FrameMaker, a m
Externí odkaz:
http://arxiv.org/abs/2211.00833
Autor:
Zhang, Xinwei, Jiang, Jianwen, Feng, Yutong, Wu, Zhi-Fan, Zhao, Xibin, Wan, Hai, Tang, Mingqian, Jin, Rong, Gao, Yue
Although a number of studies are devoted to novel category discovery, most of them assume a static setting where both labeled and unlabeled data are given at once for finding new categories. In this work, we focus on the application scenarios where u
Externí odkaz:
http://arxiv.org/abs/2210.04174
Autor:
Yuan, Hangjie, Jiang, Jianwen, Albanie, Samuel, Feng, Tao, Huang, Ziyuan, Ni, Dong, Tang, Mingqian
The task of Human-Object Interaction (HOI) detection targets fine-grained visual parsing of humans interacting with their environment, enabling a broad range of applications. Prior work has demonstrated the benefits of effective architecture design a
Externí odkaz:
http://arxiv.org/abs/2209.01814
Autor:
Cen, Jun, Yun, Peng, Zhang, Shiwei, Cai, Junhao, Luan, Di, Wang, Michael Yu, Liu, Ming, Tang, Mingqian
Current methods for LIDAR semantic segmentation are not robust enough for real-world applications, e.g., autonomous driving, since it is closed-set and static. The closed-set assumption makes the network only able to output labels of trained classes,
Externí odkaz:
http://arxiv.org/abs/2207.01452
Autor:
Wang, Xiang, Zhang, Shiwei, Qing, Zhiwu, Tang, Mingqian, Zuo, Zhengrong, Gao, Changxin, Jin, Rong, Sang, Nong
Current few-shot action recognition methods reach impressive performance by learning discriminative features for each video via episodic training and designing various temporal alignment strategies. Nevertheless, they are limited in that (a) learning
Externí odkaz:
http://arxiv.org/abs/2204.13423
Autor:
Qing, Zhiwu, Zhang, Shiwei, Huang, Ziyuan, Xu, Yi, Wang, Xiang, Tang, Mingqian, Gao, Changxin, Jin, Rong, Sang, Nong
Natural videos provide rich visual contents for self-supervised learning. Yet most existing approaches for learning spatio-temporal representations rely on manually trimmed videos, leading to limited diversity in visual patterns and limited performan
Externí odkaz:
http://arxiv.org/abs/2204.03017
Autor:
Huang, Ziyuan, Zhang, Shiwei, Pan, Liang, Qing, Zhiwu, Tang, Mingqian, Liu, Ziwei, Ang Jr, Marcelo H.
Spatial convolutions are widely used in numerous deep video models. It fundamentally assumes spatio-temporal invariance, i.e., using shared weights for every location in different frames. This work presents Temporally-Adaptive Convolutions (TAdaConv)
Externí odkaz:
http://arxiv.org/abs/2110.06178
The pretrain-finetune paradigm has shown outstanding performance on many applications of deep learning, where a model is pre-trained on a upstream large dataset (e.g. ImageNet), and is then fine-tuned to different downstream tasks. Though for most ca
Externí odkaz:
http://arxiv.org/abs/2110.06014
The existence of noisy data is prevalent in both the training and testing phases of machine learning systems, which inevitably leads to the degradation of model performance. There have been plenty of works concentrated on learning with in-distributio
Externí odkaz:
http://arxiv.org/abs/2108.11035
Autor:
Ding, Xinpeng, Wang, Nannan, Zhang, Shiwei, Cheng, De, Li, Xiaomeng, Huang, Ziyuan, Tang, Mingqian, Gao, Xinbo
Current approaches for video grounding propose kinds of complex architectures to capture the video-text relations, and have achieved impressive improvements. However, it is hard to learn the complicated multi-modal relations by only architecture desi
Externí odkaz:
http://arxiv.org/abs/2108.10576