Zobrazeno 1 - 10
of 301
pro vyhledávání: '"Yan Yibo"'
Ensuring that Multimodal Large Language Models (MLLMs) maintain consistency in their responses is essential for developing trustworthy multimodal intelligence. However, existing benchmarks include many samples where all MLLMs \textit{exhibit high res
Externí odkaz:
http://arxiv.org/abs/2411.02708
In recent years, multimodal large language models (MLLMs) have significantly advanced, integrating more modalities into diverse applications. However, the lack of explainability remains a major barrier to their use in scenarios requiring decision tra
Externí odkaz:
http://arxiv.org/abs/2410.04819
Multimodal Large Language Models (MLLMs) have emerged as a central focus in both industry and academia, but often suffer from biases introduced by visual and language priors, which can lead to multimodal hallucination. These biases arise from the vis
Externí odkaz:
http://arxiv.org/abs/2410.04780
Autor:
Yan, Yibo, Wang, Shen, Huo, Jiahao, Li, Hang, Li, Boyan, Su, Jiamin, Gao, Xiong, Zhang, Yi-Fan, Xu, Tianlong, Chu, Zhendong, Zhong, Aoxiao, Wang, Kun, Xiong, Hui, Yu, Philip S., Hu, Xuming, Wen, Qingsong
As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their potential to revolutionize artificial intelligence is particularly promising, especially in addressing mathematical reasoning tasks. Current mathematical benchmarks p
Externí odkaz:
http://arxiv.org/abs/2410.04509
Autor:
Zou, Xin, Wang, Yizhou, Yan, Yibo, Huang, Sirui, Zheng, Kening, Chen, Junkai, Tang, Chang, Hu, Xuming
Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) are susceptible to hallucinations, especially assertively fabricating content not present in the visual inputs. To address the aforementioned challenge, we follow a commo
Externí odkaz:
http://arxiv.org/abs/2410.03577
In human reading and communication, individuals tend to engage in geospatial reasoning, which involves recognizing geographic entities and making informed inferences about their interrelationships. To mimic such cognitive process, current methods eit
Externí odkaz:
http://arxiv.org/abs/2408.11366
Hallucination issues persistently plagued current multimodal large language models (MLLMs). While existing research primarily focuses on object-level or attribute-level hallucinations, sidelining the more sophisticated relation hallucinations that ne
Externí odkaz:
http://arxiv.org/abs/2408.09429
Publikováno v:
Ecological Indicators, Vol 133, Iss , Pp 108380- (2021)
In the present study, the urban agglomeration on the northern slope of Tianshan Mountain (UANST) in Xinjiang was taken as the research object. Principal component analysis, coefficient of variation and analytic hierarchy process were used to analyse
Externí odkaz:
https://doaj.org/article/2dfa941e037c44fa97352b4b1c13dbaa
Autor:
Zhu, Junyi, Liu, Shuochen, Yu, Yu, Tang, Bo, Yan, Yibo, Li, Zhiyu, Xiong, Feiyu, Xu, Tong, Blaschko, Matthew B.
Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to en
Externí odkaz:
http://arxiv.org/abs/2406.16069
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identif
Externí odkaz:
http://arxiv.org/abs/2406.11193