Výsledky vyhledávání

Report

Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering

Autor: Hao, Dongze, Wang, Qunbo, Guo, Longteng, Jiang, Jie, Liu, Jing

While large pre-trained visual-language models have shown promising results on traditional visual question answering benchmarks, it is still challenging for them to answer complex VQA problems which requires diverse world knowledge. Motivated by the

Externí odkaz: http://arxiv.org/abs/2404.13947

Zobrazit plný text záznamu

Report

Knowledge Condensation and Reasoning for Knowledge-based VQA

Autor: Hao, Dongze, Jia, Jian, Guo, Longteng, Wang, Qunbo, Yang, Te, Li, Yan, Cheng, Yanhua, Wang, Bo, Chen, Quan, Li, Han, Liu, Jing

Knowledge-based visual question answering (KB-VQA) is a challenging task, which requires the model to leverage external knowledge for comprehending and answering questions grounded in visual content. Recent studies retrieve the knowledge passages fro

Externí odkaz: http://arxiv.org/abs/2403.10037

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání