Výsledky vyhledávání - "Feng, Zhenhua"

Report

SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion

Autor: Dai, Ming, Yang, Lingfeng, Xu, Yihao, Feng, Zhenhua, Yang, Wankou

Visual grounding is a common vision task that involves grounding descriptive sentences to the corresponding regions of an image. Most existing methods use independent image-text encoding and apply complex hand-crafted modules or encoder-decoder archi

Externí odkaz: http://arxiv.org/abs/2409.17531

Zobrazit plný text záznamu

Report

Probabilistically Aligned View-unaligned Clustering with Adaptive Template Selection

Autor: Dong, Wenhua, Wu, Xiao-Jun, Feng, Zhenhua, Atito, Sara, Awais, Muhammad, Kittler, Josef

In most existing multi-view modeling scenarios, cross-view correspondence (CVC) between instances of the same target from different views, like paired image-text data, is a crucial prerequisite for effortlessly deriving a consistent representation. N

Externí odkaz: http://arxiv.org/abs/2409.14882

Zobrazit plný text záznamu

Report

C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition

Autor: Li, Rongchang, Feng, Zhenhua, Xu, Tianyang, Li, Linze, Wu, Xiao-Jun, Awais, Muhammad, Atito, Sara, Kittler, Josef

Compositional actions consist of dynamic (verbs) and static (objects) concepts. Humans can easily recognize unseen compositions using the learned concepts. For machines, solving such a problem requires a model to recognize unseen actions composed of

Externí odkaz: http://arxiv.org/abs/2407.06113

Zobrazit plný text záznamu

Report

Investigating Self-Supervised Methods for Label-Efficient Learning

Autor: Nandam, Srinivasa Rao, Atito, Sara, Feng, Zhenhua, Kittler, Josef, Awais, Muhammad

Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks like classification, segmentation and detection. The low-shot learning capability of thes

Externí odkaz: http://arxiv.org/abs/2406.17460

Zobrazit plný text záznamu

Report

Pseudo Labelling for Enhanced Masked Autoencoders

Autor: Nandam, Srinivasa Rao, Atito, Sara, Feng, Zhenhua, Kittler, Josef, Awais, Muhammad

Masked Image Modeling (MIM)-based models, such as SdAE, CAE, GreenMIM, and MixAE, have explored different strategies to enhance the performance of Masked Autoencoders (MAE) by modifying prediction, loss functions, or incorporating additional architec

Externí odkaz: http://arxiv.org/abs/2406.17450

Zobrazit plný text záznamu

Report

Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method

Autor: Tang, Zhangyong, Xu, Tianyang, Feng, Zhenhua, Zhu, Xuefeng, Wang, He, Shao, Pengcheng, Cheng, Chunyang, Wu, Xiao-Jun, Awais, Muhammad, Atito, Sara, Kittler, Josef

RGBT tracking draws increasing attention due to its robustness in multi-modality warranting (MMW) scenarios, such as nighttime and bad weather, where relying on a single sensing modality fails to ensure stable tracking results. However, the existing

Externí odkaz: http://arxiv.org/abs/2405.00168

Zobrazit plný text záznamu

Report

DailyMAE: Towards Pretraining Masked Autoencoders in One Day

Autor: Wu, Jiantao, Mo, Shentong, Atito, Sara, Feng, Zhenhua, Kittler, Josef, Awais, Muhammad

Recently, masked image modeling (MIM), an important self-supervised learning (SSL) method, has drawn attention for its effectiveness in learning data representation from unlabeled data. Numerous studies underscore the advantages of MIM, highlighting

Externí odkaz: http://arxiv.org/abs/2404.00509

Zobrazit plný text záznamu

Report

Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of Representation from Self-Supervised Learning

Autor: Wu, Jiantao, Mo, Shentong, Atito, Sara, Kittler, Josef, Feng, Zhenhua, Awais, Muhammad

Recently, self-supervised metric learning has raised attention for the potential to learn a generic distance function. It overcomes the limitations of conventional supervised one, e.g., scalability and label biases. Despite progress in this domain, c

Externí odkaz: http://arxiv.org/abs/2312.01118

Zobrazit plný text záznamu

Report

SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition

Autor: Wu, Cong, Wu, Xiao-Jun, Kittler, Josef, Xu, Tianyang, Atito, Sara, Awais, Muhammad, Feng, Zhenhua

Contrastive learning has achieved great success in skeleton-based action recognition. However, most existing approaches encode the skeleton sequences as entangled spatiotemporal representations and confine the contrasts to the same level of represent

Externí odkaz: http://arxiv.org/abs/2309.05834

Zobrazit plný text záznamu

Report

Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding

Autor: Wu, Jiantao, Mo, Shentong, Awais, Muhammad, Atito, Sara, Feng, Zhenhua, Kittler, Josef

Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data. In the realm of computer vision, pretrained vision transformers (ViTs) have

Externí odkaz: http://arxiv.org/abs/2308.11448

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání