Zobrazeno 1 - 10
of 227
pro vyhledávání: '"Wu, Jianlong"'
Diffusion models (DMs) have demonstrated exceptional generative capabilities across various areas, while they are hindered by slow inference speeds and high computational demands during deployment. The most common way to accelerate DMs involves reduc
Externí odkaz:
http://arxiv.org/abs/2409.03550
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples while preserving the knowledge of previously learned classes. Traditional methods widely adopt static adaptation
Externí odkaz:
http://arxiv.org/abs/2407.06136
Autor:
Yang, Yibo, Li, Xiaojie, Zhou, Zhongzhu, Song, Shuaiwen Leon, Wu, Jianlong, Nie, Liqiang, Ghanem, Bernard
Current parameter-efficient fine-tuning (PEFT) methods build adapters without considering the context of downstream task to learn, or the context of important knowledge to maintain. As a result, there is often a performance gap compared to full-param
Externí odkaz:
http://arxiv.org/abs/2406.05223
Self-supervised learning has achieved remarkable success in acquiring high-quality representations from unlabeled data. The widely adopted contrastive learning framework aims to learn invariant representations by minimizing the distance between posit
Externí odkaz:
http://arxiv.org/abs/2403.12003
Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process. This paper addresses these challenges by focusing on the quanti
Externí odkaz:
http://arxiv.org/abs/2402.12065
Autor:
Dong, Xingning, Guo, Qingpei, Gan, Tian, Wang, Qing, Wu, Jianlong, Ren, Xiangyuan, Cheng, Yuan, Chu, Wei
We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on th
Externí odkaz:
http://arxiv.org/abs/2401.17773
Misinformation has become a pressing issue. Fake media, in both visual and textual forms, is widespread on the web. While various deepfake detection and text fake news detection methods have been proposed, they are only designed for single-modality f
Externí odkaz:
http://arxiv.org/abs/2309.14203
This paper aims to tackle a novel task - Temporal Sentence Grounding in Streaming Videos (TSGSV). The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query. Unlike regular videos, streaming videos are acquired c
Externí odkaz:
http://arxiv.org/abs/2308.07102
Autor:
Yang, Yibo, Yuan, Haobo, Li, Xiangtai, Wu, Jianlong, Zhang, Lefei, Lin, Zhouchen, Torr, Philip, Tao, Dacheng, Ghanem, Bernard
How to enable learnability for new classes while keeping the capability well on old classes has been a crucial challenge for class incremental learning. Beyond the normal case, long-tail class incremental learning and few-shot class incremental learn
Externí odkaz:
http://arxiv.org/abs/2308.01746
The composed image retrieval (CIR) task aims to retrieve the desired target image for a given multimodal query, i.e., a reference image with its corresponding modification text. The key limitations encountered by existing efforts are two aspects: 1)
Externí odkaz:
http://arxiv.org/abs/2305.09979