Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering

Autor:	Zhongjian Hu, Peng Yang, Fengyuan Liu, Yuan Meng, Xingyu Liu
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	visual question answering knowledge-based visual question answering large language model knowledge injection Electronic computers. Computer science QA75.5-76.95
Zdroj:	Big Data Mining and Analytics, Vol 7, Iss 3, Pp 843-857 (2024)
Druh dokumentu:	article
ISSN:	2096-0654
DOI:	10.26599/BDMA.2024.9020026
Popis:	Previous works employ the Large Language Model (LLM) like GPT-3 for knowledge-based Visual Question Answering (VQA). We argue that the inferential capacity of LLM can be enhanced through knowledge injection. Although methods that utilize knowledge graphs to enhance LLM have been explored in various tasks, they may have some limitations, such as the possibility of not being able to retrieve the required knowledge. In this paper, we introduce a novel framework for knowledge-based VQA titled “Prompting Large Language Models with Knowledge-Injection” (PLLMKI). We use vanilla VQA model to inspire the LLM and further enhance the LLM with knowledge injection. Unlike earlier approaches, we adopt the LLM for knowledge enhancement instead of relying on knowledge graphs. Furthermore, we leverage open LLMs, incurring no additional costs. In comparison to existing baselines, our approach exhibits the accuracy improvement of over 1.3 and 1.7 on two knowledge-based VQA datasets, namely OK-VQA and A-OKVQA, respectively.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/4a5c02e765b940a88f6ed65e4d31f4d3 Zobrazit plný text záznamu View record in DOAJ