Zobrazeno 1 - 10
of 5 428
pro vyhledávání: '"Zhang, XiaoFeng"'
Autor:
Zhang, Xiaofeng, Quan, Yihao, Gu, Chaochen, Shen, Chen, Yuan, Xiaosong, Yan, Shaotian, Cheng, Hao, Wu, Kaijie, Ye, Jieping
The hallucination problem in multimodal large language models (MLLMs) remains a common issue. Although image tokens occupy a majority of the input sequence of MLLMs, there is limited research to explore the relationship between image tokens and hallu
Externí odkaz:
http://arxiv.org/abs/2411.09968
With the rapid development of large vision language models (LVLMs), these models have shown excellent results in various multimodal tasks. Since LVLMs are prone to hallucinations and there are currently few datasets and evaluation methods specificall
Externí odkaz:
http://arxiv.org/abs/2411.02733
Autor:
Li, Mingxian, Sun, Hao, Lei, Yingtie, Zhang, Xiaofeng, Dong, Yihang, Zhou, Yilin, Li, Zimeng, Chen, Xuhang
Document images are often degraded by various stains, significantly impacting their readability and hindering downstream applications such as document digitization and analysis. The absence of a comprehensive stained document dataset has limited the
Externí odkaz:
http://arxiv.org/abs/2410.22922
Multimodal Large Language Models have advanced AI in applications like text-to-video generation and visual question answering. These models rely on visual encoders to convert non-text data into vectors, but current encoders either lack semantic align
Externí odkaz:
http://arxiv.org/abs/2410.20109
Autor:
Mao, Xuan, Liu, He-Yang, Wang, Song, Ling, Zhixing, Yuan, Weimin, Cheng, Huaqing, Pan, Haiwu, Li, Dongyue, Favata, Fabio, Ji, Tuo, Zhang, Jujia, Zhao, Xinlin, Wan, Jing, Cai, Zhiming, Castro-Tirado, Alberto J., Dai, Yanfeng, Deng, Licai, Ding, Xu, Ji, Kaifan, Jin, Chichuan, Lei, Yajuan, Li, Huali, Lin, Jun, Liu, Huaqiu, Liu, Mingjun, Liu, Shuai, Liu, Yuan, Sun, Hui, Sun, Shengli, Sun, Xiaojin, Shi, Jianrong, Wang, Jianguo, Wang, Jingxiu, Wang, Wenxin, Wei, Jianyan, Xin, Liping, Xiong, Dingrong, Zhang, Chen, Zhang, Wenda, Zhang, Yonghe, Zhang, Xiaofeng, Zhao, Donghua, Zhou, Guiping
LEIA (Lobster Eye Imager for Astronomy) detected a new X-ray transient on November 7, 2022, identified as a superflare event occurring on a nearby RS CVn-type binary HD 251108. The flux increase was also detected in follow-up observations at X-ray, U
Externí odkaz:
http://arxiv.org/abs/2410.17999
This paper presents an innovative approach called BGTAI to simplify multimodal understanding by utilizing gloss-based annotation as an intermediate step in aligning Text and Audio with Images. While the dynamic temporal factors in textual and audio i
Externí odkaz:
http://arxiv.org/abs/2410.03146
Autor:
Yuan, Xiaosong, Shen, Chen, Yan, Shaotian, Zhang, Xiaofeng, Xie, Liang, Wang, Wenxiao, Guan, Renchu, Wang, Ying, Ye, Jieping
Zero-shot Chain-of-Thought (CoT) prompting emerges as a simple and effective strategy for enhancing the performance of large language models (LLMs) in real-world reasoning tasks. Nonetheless, the efficacy of a singular, task-level prompt uniformly ap
Externí odkaz:
http://arxiv.org/abs/2409.20441
Autor:
Wei, Jinfeng, Zhang, Xiaofeng
In this work, we introduce DOPRA, a novel approach designed to mitigate hallucinations in multi-modal large language models (MLLMs). Unlike existing solutions that typically involve costly supplementary training data or the integration of external kn
Externí odkaz:
http://arxiv.org/abs/2407.15130
Autor:
Zhang, Xiaofeng, Quan, Yihao, Shen, Chen, Yuan, Xiaosong, Yan, Shaotian, Xie, Liang, Wang, Wenxiao, Gu, Chaochen, Tang, Hao, Ye, Jieping
Large Vision Language Models (LVLMs) achieve great performance on visual-language reasoning tasks, however, the black-box nature of LVLMs hinders in-depth research on the reasoning mechanism. As all images need to be converted into image tokens to fi
Externí odkaz:
http://arxiv.org/abs/2406.06579
In the field of low-light image enhancement, both traditional Retinex methods and advanced deep learning techniques such as Retinexformer have shown distinct advantages and limitations. Traditional Retinex methods, designed to mimic the human eye's p
Externí odkaz:
http://arxiv.org/abs/2405.03349