Zobrazeno 1 - 10
of 134
pro vyhledávání: '"Qu, Xiaoye"'
In recent years, Contrastive Language-Image Pre-training (CLIP) has become a cornerstone in multimodal intelligence. However, recent studies have identified that the information loss in the CLIP encoding process is substantial, and CLIP tends to capt
Externí odkaz:
http://arxiv.org/abs/2409.19291
Large Vision-Language Models (LVLMs) have become pivotal at the intersection of computer vision and natural language processing. However, the full potential of LVLMs Retrieval-Augmented Generation (RAG) capabilities remains underutilized. Existing wo
Externí odkaz:
http://arxiv.org/abs/2409.14083
Recently, Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in multi-modal context comprehension. However, they still suffer from hallucination problems referring to generating inconsistent outputs with the image content.
Externí odkaz:
http://arxiv.org/abs/2408.17150
Autor:
Su, Zhaochen, Zhang, Jun, Qu, Xiaoye, Zhu, Tong, Li, Yanshu, Sun, Jiashuo, Li, Juntao, Zhang, Min, Cheng, Yu
Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts betwe
Externí odkaz:
http://arxiv.org/abs/2408.12076
Despite the remarkable ability of large vision-language models (LVLMs) in image comprehension, these models frequently generate plausible yet factually incorrect responses, a phenomenon known as hallucination.Recently, in large language models (LLMs)
Externí odkaz:
http://arxiv.org/abs/2408.00555
While Large Vision-Language Models (LVLMs) have exhibited remarkable capabilities across a wide range of tasks, they suffer from hallucination problems, where models generate plausible yet incorrect answers given the input image-query pair. This hall
Externí odkaz:
http://arxiv.org/abs/2408.00550
With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Langua
Externí odkaz:
http://arxiv.org/abs/2407.07403
Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for scaling up large language models (LLMs). However, training MoE from scratch in a large-scale setting still suffers from data-hungry and instability problems. Motiv
Externí odkaz:
http://arxiv.org/abs/2406.16554
Reasoning about time is essential for Large Language Models (LLMs) to understand the world. Previous works focus on solving specific tasks, primarily on time-sensitive question answering. While these methods have proven effective, they cannot general
Externí odkaz:
http://arxiv.org/abs/2406.14192
Mixture-of-Experts (MoE) models have shown remarkable capability in instruction tuning, especially when the number of tasks scales. However, previous methods simply merge all training tasks (e.g. creative writing, coding, and mathematics) and apply f
Externí odkaz:
http://arxiv.org/abs/2406.11256