Výsledky vyhledávání - "Yao, Huanjin"

Report

Autor: Yao, Huanjin, Wu, Wenhao, Yang, Taojiannan, Song, YuXin, Zhang, Mengxi, Feng, Haocheng, Sun, Yifan, Li, Zhiheng, Ouyang, Wanli, Wang, Jingdong

Do we fully leverage the potential of visual encoder in Multimodal Large Language Models (MLLMs)? The recent outstanding performance of MLLMs in multimodal understanding has garnered broad attention from both academia and industry. In the current MLL

Externí odkaz: http://arxiv.org/abs/2405.13800

Zobrazit plný text záznamu

Report

Automated Multi-level Preference for MLLMs

Autor: Zhang, Mengxi, Wu, Wenhao, Lu, Yu, Song, Yuxin, Rong, Kang, Yao, Huanjin, Zhao, Jianbo, Liu, Fanglong, Sun, Yifan, Feng, Haocheng, Wang, Jingdong

Current multimodal Large Language Models (MLLMs) suffer from ``hallucination'', occasionally generating responses that are not grounded in the input images. To tackle this challenge, one promising path is to utilize reinforcement learning from human

Externí odkaz: http://arxiv.org/abs/2405.11165

Zobrazit plný text záznamu

Report

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

Autor: Wu, Wenhao, Yao, Huanjin, Zhang, Mengxi, Song, Yuxin, Ouyang, Wanli, Wang, Jingdong

This paper does not present a novel method. Instead, it delves into an essential, yet must-know baseline in light of the latest advancements in Generative Artificial Intelligence (GenAI): the utilization of GPT-4 for visual understanding. Our study c

Externí odkaz: http://arxiv.org/abs/2311.15732

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání