Zobrazeno 1 - 10
of 66
pro vyhledávání: '"Chen, Haoxing"'
Recent Vision Mamba models not only have much lower complexity for processing higher resolution images and longer videos but also the competitive performance with Vision Transformers (ViTs). However, they are stuck into overfitting and thus only pres
Externí odkaz:
http://arxiv.org/abs/2408.17081
Autor:
Chen, Haoxing, Hong, Yan, Huang, Zizheng, Xu, Zhuoer, Gu, Zhangxuan, Li, Yaohui, Lan, Jun, Zhu, Huijia, Zhang, Jianfu, Wang, Weiqiang, Li, Huaxiong
Recently, video generation techniques have advanced rapidly. Given the popularity of video content on social media platforms, these models intensify concerns about the spread of fake information. Therefore, there is a growing demand for detectors cap
Externí odkaz:
http://arxiv.org/abs/2405.19707
We study the problem of few-shot out-of-distribution (OOD) detection, which aims to detect OOD samples from unseen categories during inference time with only a few labeled in-domain (ID) samples. Existing methods mainly focus on training task-aware p
Externí odkaz:
http://arxiv.org/abs/2405.16146
Autor:
Chen, Haoxing, Li, Yaohui, Huang, Zizheng, Hong, Yan, Xu, Zhuoer, Gu, Zhangxuan, Lan, Jun, Zhu, Huijia, Wang, Weiqiang
Pre-trained large-scale vision-language models (VLMs) have acquired profound understanding of general visual concepts. Recent advancements in efficient transfer learning (ETL) have shown remarkable success in fine-tuning VLMs within the scenario of l
Externí odkaz:
http://arxiv.org/abs/2404.09872
Contrastive Language-Image Pre-training (CLIP) has shown powerful zero-shot learning performance. Few-shot learning aims to further enhance the transfer capability of CLIP by giving few images in each class, aka 'few shots'. Most existing methods eit
Externí odkaz:
http://arxiv.org/abs/2404.09778
Image harmonization is a crucial technique in image composition that aims to seamlessly match the background by adjusting the foreground of composite images. Current methods adopt either global-level or pixel-level feature matching. Global-level feat
Externí odkaz:
http://arxiv.org/abs/2312.12729
Autor:
Chen, Haoxing, Li, Yaohui, Hong, Yan, Huang, Zizheng, Xu, Zhuoer, Gu, Zhangxuan, Lan, Jun, Zhu, Huijia, Wang, Weiqiang
Audio-visual zero-shot learning aims to recognize unseen classes based on paired audio-visual sequences. Recent methods mainly focus on learning multi-modal features aligned with class names to enhance the generalization ability to unseen categories.
Externí odkaz:
http://arxiv.org/abs/2311.12268
Autor:
Chen, Haoxing, Xu, Zhuoer, Gu, Zhangxuan, Lan, Jun, Zheng, Xing, Li, Yaohui, Meng, Changhua, Zhu, Huijia, Wang, Weiqiang
Diffusion model based language-guided image editing has achieved great success recently. However, existing state-of-the-art diffusion models struggle with rendering correct text and text style during generation. To tackle this problem, we propose a u
Externí odkaz:
http://arxiv.org/abs/2305.10825
Recent object detection approaches rely on pretrained vision-language models for image-text alignment. However, they fail to detect the Mobile User Interface (MUI) element since it contains additional OCR information, which describes its content and
Externí odkaz:
http://arxiv.org/abs/2305.09699
Diffusion frameworks have achieved comparable performance with previous state-of-the-art image generation models. Researchers are curious about its variants in discriminative tasks because of its powerful noise-to-image denoising pipeline. This paper
Externí odkaz:
http://arxiv.org/abs/2212.02773