Výsledky vyhledávání - "Huang, Huaibo"

Report

Deep Learning Technology for Face Forgery Detection: A Survey

Autor: Ma, Lixia, Yang, Puning, Xu, Yuting, Yang, Ziming, Li, Peipei, Huang, Huaibo

Currently, the rapid development of computer vision and deep learning has enabled the creation or manipulation of high-fidelity facial images and videos via deep generative approaches. This technology, also known as deepfake, has achieved dramatic pr

Externí odkaz: http://arxiv.org/abs/2409.14289

Zobrazit plný text záznamu

Report

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Autor: Han, Xiaotian, Jian, Yiren, Hu, Xuefeng, Liu, Haogeng, Wang, Yiqi, Fan, Qihang, Ai, Yuang, Huang, Huaibo, He, Ran, Yang, Zhenheng, You, Quanzeng

Pre-training on large-scale, high-quality datasets is crucial for enhancing the reasoning capabilities of Large Language Models (LLMs), especially in specialized domains such as mathematics. Despite the recognized importance, the Multimodal LLMs (MLL

Externí odkaz: http://arxiv.org/abs/2409.12568

Zobrazit plný text záznamu

Report

ZePo: Zero-Shot Portrait Stylization with Faster Sampling

Autor: Liu, Jin, Huang, Huaibo, Cao, Jie, He, Ran

Diffusion-based text-to-image generation models have significantly advanced the field of art content synthesis. However, current portrait stylization methods generally require either model fine-tuning based on examples or the employment of DDIM Inver

Externí odkaz: http://arxiv.org/abs/2408.05492

Zobrazit plný text záznamu

Report

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

Autor: Liu, Xuannan, Li, Zekun, Li, Peipei, Xia, Shuhan, Cui, Xing, Huang, Linzhi, Huang, Huaibo, Deng, Weihong, He, Zhaofeng

Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-sourc

Externí odkaz: http://arxiv.org/abs/2406.08772

Zobrazit plný text záznamu

Report

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

Autor: Liu, Haogeng, You, Quanzeng, Han, Xiaotian, Liu, Yongfei, Huang, Huaibo, He, Ran, Yang, Hongxia

In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs). Despite its importance, the vision-language connector has been relativ

Externí odkaz: http://arxiv.org/abs/2405.17815

Zobrazit plný text záznamu

Report

Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer

Autor: Fan, Qihang, Huang, Huaibo, Chen, Mingrui, He, Ran

The Vision Transformer (ViT) has gained prominence for its superior relational modeling prowess. However, its global attention mechanism's quadratic complexity poses substantial computational burdens. A common remedy spatially groups tokens for self-

Externí odkaz: http://arxiv.org/abs/2405.13337

Zobrazit plný text záznamu

Report

Vision Transformer with Sparse Scan Prior

Autor: Fan, Qihang, Huang, Huaibo, Chen, Mingrui, He, Ran

In recent years, Transformers have achieved remarkable progress in computer vision tasks. However, their global modeling often comes with substantial computational overhead, in stark contrast to the human eye's efficient information processing. Inspi

Externí odkaz: http://arxiv.org/abs/2405.13335

Zobrazit plný text záznamu

Report

ViTAR: Vision Transformer with Any Resolution

Autor: Fan, Qihang, You, Quanzeng, Han, Xiaotian, Liu, Yongfei, Tao, Yunzhe, Huang, Huaibo, He, Ran, Yang, Hongxia

This paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen d

Externí odkaz: http://arxiv.org/abs/2403.18361

Zobrazit plný text záznamu

Report

DiffMAC: Diffusion Manifold Hallucination Correction for High Generalization Blind Face Restoration

Autor: Gao, Nan, Li, Jia, Huang, Huaibo, Zeng, Zhi, Shang, Ke, Zhang, Shuwu, He, Ran

Blind face restoration (BFR) is a highly challenging problem due to the uncertainty of degradation patterns. Current methods have low generalization across photorealistic and heterogeneous domains. In this paper, we propose a Diffusion-Information-Di

Externí odkaz: http://arxiv.org/abs/2403.10098

Zobrazit plný text záznamu

Report

FKA-Owl: Advancing Multimodal Fake News Detection through Knowledge-Augmented LVLMs

Autor: Liu, Xuannan, Li, Peipei, Huang, Huaibo, Li, Zekun, Cui, Xing, Liang, Jiahao, Qin, Lixiong, Deng, Weihong, He, Zhaofeng

The massive generation of multimodal fake news involving both text and images exhibits substantial distribution discrepancies, prompting the need for generalized detectors. However, the insulated nature of training restricts the capability of classic

Externí odkaz: http://arxiv.org/abs/2403.01988

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání