Zobrazeno 1 - 10
of 294
pro vyhledávání: '"Yan, Yibo"'
In recent years, multimodal large language models (MLLMs) have significantly advanced, integrating more modalities into diverse applications. However, the lack of explainability remains a major barrier to their use in scenarios requiring decision tra
Externí odkaz:
http://arxiv.org/abs/2410.04819
Multimodal Large Language Models (MLLMs) have emerged as a central focus in both industry and academia, but often suffer from biases introduced by visual and language priors, which can lead to multimodal hallucination. These biases arise from the vis
Externí odkaz:
http://arxiv.org/abs/2410.04780
Autor:
Yan, Yibo, Wang, Shen, Huo, Jiahao, Li, Hang, Li, Boyan, Su, Jiamin, Gao, Xiong, Zhang, Yi-Fan, Xu, Tianlong, Chu, Zhendong, Zhong, Aoxiao, Wang, Kun, Xiong, Hui, Yu, Philip S., Hu, Xuming, Wen, Qingsong
As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their potential to revolutionize artificial intelligence is particularly promising, especially in addressing mathematical reasoning tasks. Current mathematical benchmarks p
Externí odkaz:
http://arxiv.org/abs/2410.04509
Autor:
Zou, Xin, Wang, Yizhou, Yan, Yibo, Huang, Sirui, Zheng, Kening, Chen, Junkai, Tang, Chang, Hu, Xuming
Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) are susceptible to hallucinations, especially assertively fabricating content not present in the visual inputs. To address the aforementioned challenge, we follow a commo
Externí odkaz:
http://arxiv.org/abs/2410.03577
In human reading and communication, individuals tend to engage in geospatial reasoning, which involves recognizing geographic entities and making informed inferences about their interrelationships. To mimic such cognitive process, current methods eit
Externí odkaz:
http://arxiv.org/abs/2408.11366
Hallucination issues persistently plagued current multimodal large language models (MLLMs). While existing research primarily focuses on object-level or attribute-level hallucinations, sidelining the more sophisticated relation hallucinations that ne
Externí odkaz:
http://arxiv.org/abs/2408.09429
Autor:
Zhu, Junyi, Liu, Shuochen, Yu, Yu, Tang, Bo, Yan, Yibo, Li, Zhiyu, Xiong, Feiyu, Xu, Tong, Blaschko, Matthew B.
Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to en
Externí odkaz:
http://arxiv.org/abs/2406.16069
MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model
Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identif
Externí odkaz:
http://arxiv.org/abs/2406.11193
Autor:
Zou, Xingchen, Huang, Jiani, Hao, Xixuan, Yang, Yuhao, Wen, Haomin, Yan, Yibo, Huang, Chao, Liang, Yuxuan
Learning effective geospatial embeddings is crucial for a series of geospatial applications such as city analytics and earth monitoring. However, learning comprehensive region representations presents two significant challenges: first, the deficiency
Externí odkaz:
http://arxiv.org/abs/2405.14135
Urbanization challenges underscore the necessity for effective satellite image-text retrieval methods to swiftly access specific information enriched with geographic semantics for urban applications. However, existing methods often overlook significa
Externí odkaz:
http://arxiv.org/abs/2404.14241