Zobrazeno 1 - 10
of 317
pro vyhledávání: '"Wu, Jianlong"'
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. MLLMs involve significant external knowledge within their paramete
Externí odkaz:
http://arxiv.org/abs/2410.14154
Knowledge distillation is a mainstream algorithm in model compression by transferring knowledge from the larger model (teacher) to the smaller model (student) to improve the performance of student. Despite many efforts, existing methods mainly invest
Externí odkaz:
http://arxiv.org/abs/2410.14143
Recently, video-language understanding has achieved great success through large-scale pre-training. However, data scarcity remains a prevailing challenge. This study quantitatively reveals an "impossible trinity" among data quantity, diversity, and q
Externí odkaz:
http://arxiv.org/abs/2409.19532
Diffusion models (DMs) have demonstrated exceptional generative capabilities across various areas, while they are hindered by slow inference speeds and high computational demands during deployment. The most common way to accelerate DMs involves reduc
Externí odkaz:
http://arxiv.org/abs/2409.03550
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples while preserving the knowledge of previously learned classes. Traditional methods widely adopt static adaptation
Externí odkaz:
http://arxiv.org/abs/2407.06136
Autor:
Yang, Yibo, Li, Xiaojie, Zhou, Zhongzhu, Song, Shuaiwen Leon, Wu, Jianlong, Nie, Liqiang, Ghanem, Bernard
Current parameter-efficient fine-tuning (PEFT) methods build adapters widely agnostic of the context of downstream task to learn, or the context of important knowledge to maintain. As a result, there is often a performance gap compared to full-parame
Externí odkaz:
http://arxiv.org/abs/2406.05223
Self-supervised learning has achieved remarkable success in acquiring high-quality representations from unlabeled data. The widely adopted contrastive learning framework aims to learn invariant representations by minimizing the distance between posit
Externí odkaz:
http://arxiv.org/abs/2403.12003
Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process. This paper addresses these challenges by focusing on the quanti
Externí odkaz:
http://arxiv.org/abs/2402.12065
Autor:
Dong, Xingning, Guo, Qingpei, Gan, Tian, Wang, Qing, Wu, Jianlong, Ren, Xiangyuan, Cheng, Yuan, Chu, Wei
We present a framework for learning cross-modal video representations by directly pre-training on raw data to facilitate various downstream video-text tasks. Our main contributions lie in the pre-training framework and proxy tasks. First, based on th
Externí odkaz:
http://arxiv.org/abs/2401.17773
Misinformation has become a pressing issue. Fake media, in both visual and textual forms, is widespread on the web. While various deepfake detection and text fake news detection methods have been proposed, they are only designed for single-modality f
Externí odkaz:
http://arxiv.org/abs/2309.14203