Zobrazeno 1 - 10
of 86
pro vyhledávání: '"Huang, Qidong"'
Autor:
Xing, Long, Huang, Qidong, Dong, Xiaoyi, Lu, Jiajie, Zhang, Pan, Zang, Yuhang, Cao, Yuhang, He, Conghui, Wang, Jiaqi, Wu, Feng, Lin, Dahua
In large vision-language models (LVLMs), images serve as inputs that carry a wealth of information. As the idiom "A picture is worth a thousand words" implies, representing a single image in current LVLMs can require hundreds or even thousands of tok
Externí odkaz:
http://arxiv.org/abs/2410.17247
Autor:
Huang, Qidong, Dong, Xiaoyi, Zhang, Pan, Zang, Yuhang, Cao, Yuhang, Wang, Jiaqi, Lin, Dahua, Zhang, Weiming, Yu, Nenghai
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs). Large-scale pre-training plays a critical role in building capable
Externí odkaz:
http://arxiv.org/abs/2410.07167
Despite the success of diffusion-based customization methods on visual content creation, increasing concerns have been raised about such techniques from both privacy and political perspectives. To tackle this issue, several anti-customization methods
Externí odkaz:
http://arxiv.org/abs/2312.07865
Autor:
Huang, Qidong, Dong, Xiaoyi, Zhang, Pan, Wang, Bin, He, Conghui, Wang, Jiaqi, Lin, Dahua, Zhang, Weiming, Yu, Nenghai
Hallucination, posed as a pervasive challenge of multi-modal large language models (MLLMs), has significantly impeded their real-world usage that demands precise judgment. Existing methods mitigate this issue with either training with specific design
Externí odkaz:
http://arxiv.org/abs/2311.17911
Autor:
Huang, Qidong, Dong, Xiaoyi, Chen, Dongdong, Chen, Yinpeng, Yuan, Lu, Hua, Gang, Zhang, Weiming, Yu, Nenghai
In this paper, we investigate the adversarial robustness of vision transformers that are equipped with BERT pretraining (e.g., BEiT, MAE). A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining
Externí odkaz:
http://arxiv.org/abs/2308.10315
Autor:
Huang, Qidong, Dong, Xiaoyi, Chen, Dongdong, Zhang, Weiming, Wang, Feifei, Hua, Gang, Yu, Nenghai
We present Diversity-Aware Meta Visual Prompting~(DAM-VP), an efficient and effective prompting method for transferring pre-trained models to downstream tasks with frozen backbone. A challenging issue in visual prompting is that image datasets someti
Externí odkaz:
http://arxiv.org/abs/2303.08138
With the deterioration of climate, the phenomenon of rain-induced flooding has become frequent. To mitigate its impact, recent works adopt convolutional neural network or its variants to predict the floods. However, these methods directly force the m
Externí odkaz:
http://arxiv.org/abs/2212.01819
Deep 3D point cloud models are sensitive to adversarial attacks, which poses threats to safety-critical applications such as autonomous driving. Robust training and defend-by-denoising are typical strategies for defending adversarial perturbations. H
Externí odkaz:
http://arxiv.org/abs/2211.16247
Autor:
Huang, Qidong, Dong, Xiaoyi, Chen, Dongdong, Zhou, Hang, Zhang, Weiming, Zhang, Kui, Hua, Gang, Yu, Nenghai
Notwithstanding the prominent performance achieved in various applications, point cloud recognition models have often suffered from natural corruptions and adversarial perturbations. In this paper, we delve into boosting the general robustness of poi
Externí odkaz:
http://arxiv.org/abs/2209.07788
Adversary and invisibility are two fundamental but conflict characters of adversarial perturbations. Previous adversarial attacks on 3D point cloud recognition have often been criticized for their noticeable point outliers, since they just involve an
Externí odkaz:
http://arxiv.org/abs/2203.04041