Zobrazeno 1 - 10
of 773
pro vyhledávání: '"Wu TZ"'
Video understanding typically requires fine-tuning the large backbone when adapting to new domains. In this paper, we leverage the egocentric video foundation models (Ego-VFMs) based on video-language pre-training and propose a parameter-efficient ad
Externí odkaz:
http://arxiv.org/abs/2407.19520
Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces lar
Externí odkaz:
http://arxiv.org/abs/2306.05689
Visual-language foundation models, like CLIP, learn generalized representations that enable zero-shot open-set classification. Few-shot adaptation methods, based on prompt tuning, have been shown to further improve performance on downstream datasets.
Externí odkaz:
http://arxiv.org/abs/2306.02240
Autor:
Wu, Tz-Ying, Swaminathan, Gurumurthy, Li, Zhizhong, Ravichandran, Avinash, Vasconcelos, Nuno, Bhotika, Rahul, Soatto, Stefano
Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes). Instead, we explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large
Externí odkaz:
http://arxiv.org/abs/2204.03634
Significant effort has been recently devoted to modeling visual relations. This has mostly addressed the design of architectures, typically by adding parameters and increasing model complexity. However, visual relation learning is a long-tailed probl
Externí odkaz:
http://arxiv.org/abs/2108.09668
Long-tail recognition tackles the natural non-uniformly distributed data in real-world scenarios. While modern classifiers perform well on populated classes, its performance degrades significantly on tail classes. Humans, however, are less affected b
Externí odkaz:
http://arxiv.org/abs/2007.09898
Multiview recognition has been well studied in the literature and achieves decent performance in object recognition and retrieval task. However, most previous works rely on supervised learning and some impractical underlying assumptions, such as the
Externí odkaz:
http://arxiv.org/abs/2003.12735
Autor:
Xu, Yiran, Yang, Xiaoyin, Gong, Lihang, Lin, Hsuan-Chu, Wu, Tz-Ying, Li, Yunsheng, Vasconcelos, Nuno
A new paradigm is proposed for autonomous driving. The new paradigm lies between the end-to-end and pipelined approaches, and is inspired by how humans solve the problem. While it relies on scene understanding, the latter only considers objects that
Externí odkaz:
http://arxiv.org/abs/2003.09405
Humans have the amazing ability to perform very subtle manipulation task using a closed-loop control system with imprecise mechanics (i.e., our body parts) but rich sensory information (e.g., vision, tactile, etc.). In the closed-loop system, the abi
Externí odkaz:
http://arxiv.org/abs/1808.01725
Anticipating human intention by observing one's actions has many applications. For instance, picking up a cellphone, then a charger (actions) implies that one wants to charge the cellphone (intention). By anticipating the intention, an intelligent sy
Externí odkaz:
http://arxiv.org/abs/1710.07477