Zobrazeno 1 - 10
of 107
pro vyhledávání: '"Huang, Huaibo"'
Currently, the rapid development of computer vision and deep learning has enabled the creation or manipulation of high-fidelity facial images and videos via deep generative approaches. This technology, also known as deepfake, has achieved dramatic pr
Externí odkaz:
http://arxiv.org/abs/2409.14289
Autor:
Han, Xiaotian, Jian, Yiren, Hu, Xuefeng, Liu, Haogeng, Wang, Yiqi, Fan, Qihang, Ai, Yuang, Huang, Huaibo, He, Ran, Yang, Zhenheng, You, Quanzeng
Pre-training on large-scale, high-quality datasets is crucial for enhancing the reasoning capabilities of Large Language Models (LLMs), especially in specialized domains such as mathematics. Despite the recognized importance, the Multimodal LLMs (MLL
Externí odkaz:
http://arxiv.org/abs/2409.12568
Diffusion-based text-to-image generation models have significantly advanced the field of art content synthesis. However, current portrait stylization methods generally require either model fine-tuning based on examples or the employment of DDIM Inver
Externí odkaz:
http://arxiv.org/abs/2408.05492
Autor:
Liu, Xuannan, Li, Zekun, Li, Peipei, Xia, Shuhan, Cui, Xing, Huang, Linzhi, Huang, Huaibo, Deng, Weihong, He, Zhaofeng
Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-sourc
Externí odkaz:
http://arxiv.org/abs/2406.08772
Autor:
Liu, Haogeng, You, Quanzeng, Han, Xiaotian, Liu, Yongfei, Huang, Huaibo, He, Ran, Yang, Hongxia
In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs). Despite its importance, the vision-language connector has been relativ
Externí odkaz:
http://arxiv.org/abs/2405.17815
The Vision Transformer (ViT) has gained prominence for its superior relational modeling prowess. However, its global attention mechanism's quadratic complexity poses substantial computational burdens. A common remedy spatially groups tokens for self-
Externí odkaz:
http://arxiv.org/abs/2405.13337
In recent years, Transformers have achieved remarkable progress in computer vision tasks. However, their global modeling often comes with substantial computational overhead, in stark contrast to the human eye's efficient information processing. Inspi
Externí odkaz:
http://arxiv.org/abs/2405.13335
Autor:
Fan, Qihang, You, Quanzeng, Han, Xiaotian, Liu, Yongfei, Tao, Yunzhe, Huang, Huaibo, He, Ran, Yang, Hongxia
This paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen d
Externí odkaz:
http://arxiv.org/abs/2403.18361
Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of degradation patterns. Current methods have low generalization across photorealistic and heterogeneous domains. In this paper, we propose a Diffusion-Information-Di
Externí odkaz:
http://arxiv.org/abs/2403.10098
Autor:
Liu, Xuannan, Li, Peipei, Huang, Huaibo, Li, Zekun, Cui, Xing, Liang, Jiahao, Qin, Lixiong, Deng, Weihong, He, Zhaofeng
The massive generation of multimodal fake news involving both text and images exhibits substantial distribution discrepancies, prompting the need for generalized detectors. However, the insulated nature of training restricts the capability of classic
Externí odkaz:
http://arxiv.org/abs/2403.01988