Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Ming, Tianshi"'
Despite the great success of Large Vision-Language Models (LVLMs), they inevitably suffer from hallucination. As we know, both the visual encoder and the Large Language Model (LLM) decoder in LVLMs are Transformer-based, allowing the model to extract
Externí odkaz:
http://arxiv.org/abs/2410.04514