Výsledky vyhledávání - "Zhang, XiaoFeng"

Report

Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

Autor: Zhang, Xiaofeng, Quan, Yihao, Gu, Chaochen, Shen, Chen, Yuan, Xiaosong, Yan, Shaotian, Cheng, Hao, Wu, Kaijie, Ye, Jieping

The hallucination problem in multimodal large language models (MLLMs) remains a common issue. Although image tokens occupy a majority of the input sequence of MLLMs, there is limited research to explore the relationship between image tokens and hallu

Externí odkaz: http://arxiv.org/abs/2411.09968

Zobrazit plný text záznamu

Report

DDFAV: Remote Sensing Large Vision Language Models Dataset and Evaluation Benchmark

Autor: Li, Haodong, Qu, Haicheng, Zhang, Xiaofeng

With the rapid development of large vision language models (LVLMs), these models have shown excellent results in various multimodal tasks. Since LVLMs are prone to hallucinations and there are currently few datasets and evaluation methods specificall

Externí odkaz: http://arxiv.org/abs/2411.02733

Zobrazit plný text záznamu

Report

High-Fidelity Document Stain Removal via A Large-Scale Real-World Dataset and A Memory-Augmented Transformer

Autor: Li, Mingxian, Sun, Hao, Lei, Yingtie, Zhang, Xiaofeng, Dong, Yihang, Zhou, Yilin, Li, Zimeng, Chen, Xuhang

Document images are often degraded by various stains, significantly impacting their readability and hindering downstream applications such as document digitization and analysis. The absence of a comprehensive stained document dataset has limited the

Externí odkaz: http://arxiv.org/abs/2410.22922

Zobrazit plný text záznamu

Report

GiVE: Guiding Visual Encoder to Perceive Overlooked Information

Autor: Li, Junjie, Ma, Jianghong, Zhang, Xiaofeng, Li, Yuhang, Shi, Jianyang

Multimodal Large Language Models have advanced AI in applications like text-to-video generation and visual question answering. These models rely on visual encoders to convert non-text data into vectors, but current encoders either lack semantic align

Externí odkaz: http://arxiv.org/abs/2410.20109

Zobrazit plný text záznamu

Report

LEIA discovery of the longest-lasting and most energetic stellar X-ray flare ever detected

LEIA (Lobster Eye Imager for Astronomy) detected a new X-ray transient on November 7, 2022, identified as a superflare event occurring on a nearby RS CVn-type binary HD 251108. The flux increase was also detected in follow-up observations at X-ray, U

Externí odkaz: http://arxiv.org/abs/2410.17999

Zobrazit plný text záznamu

Report

Bridging the Gap between Text, Audio, Image, and Any Sequence: A Novel Approach using Gloss-based Annotation

Autor: Fang, Sen, Chen, Sizhou, Feng, Yalin, Zhang, Xiaofeng, Teoh, Teik Toe

This paper presents an innovative approach called BGTAI to simplify multimodal understanding by utilizing gloss-based annotation as an intermediate step in aligning Text and Audio with Images. While the dynamic temporal factors in textual and audio i

Externí odkaz: http://arxiv.org/abs/2410.03146

Zobrazit plný text záznamu

Report

Instance-adaptive Zero-shot Chain-of-Thought Prompting

Autor: Yuan, Xiaosong, Shen, Chen, Yan, Shaotian, Zhang, Xiaofeng, Xie, Liang, Wang, Wenxiao, Guan, Renchu, Wang, Ying, Ye, Jieping

Zero-shot Chain-of-Thought (CoT) prompting emerges as a simple and effective strategy for enhancing the performance of large language models (LLMs) in real-world reasoning tasks. Nonetheless, the efficacy of a singular, task-level prompt uniformly ap

Externí odkaz: http://arxiv.org/abs/2409.20441

Zobrazit plný text záznamu

Report

DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer

Autor: Wei, Jinfeng, Zhang, Xiaofeng

In this work, we introduce DOPRA, a novel approach designed to mitigate hallucinations in multi-modal large language models (MLLMs). Unlike existing solutions that typically involve costly supplementary training data or the integration of external kn

Externí odkaz: http://arxiv.org/abs/2407.15130

Zobrazit plný text záznamu

Report

From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks

Autor: Zhang, Xiaofeng, Quan, Yihao, Shen, Chen, Yuan, Xiaosong, Yan, Shaotian, Xie, Liang, Wang, Wenxiao, Gu, Chaochen, Tang, Hao, Ye, Jieping

Large Vision Language Models (LVLMs) achieve great performance on visual-language reasoning tasks, however, the black-box nature of LVLMs hinders in-depth research on the reasoning mechanism. As all images need to be converted into image tokens to fi

Externí odkaz: http://arxiv.org/abs/2406.06579

Zobrazit plný text záznamu

Report

Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement

Autor: Bai, Jiesong, Yin, Yuhao, He, Qiyuan, Li, Yuanxian, Zhang, Xiaofeng

In the field of low-light image enhancement, both traditional Retinex methods and advanced deep learning techniques such as Retinexformer have shown distinct advantages and limitations. Traditional Retinex methods, designed to mimic the human eye's p

Externí odkaz: http://arxiv.org/abs/2405.03349

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání