KVQA: Knowledge-Aware Visual Question Answering

Autor:	Naganand Yadati, Sanket Shah, Anand Mishra, Partha Pratim Talukdar
Rok vydání:	2019
Předmět:	Information retrieval Knowledge graph Commonsense knowledge Computer science 020204 information systems 0202 electrical engineering electronic engineering information engineering Question answering Proper noun 020201 artificial intelligence & image processing 02 engineering and technology General Medicine Baseline (configuration management) Task (project management)
Zdroj:	AAAI Scopus-Elsevier
ISSN:	2374-3468 2159-5399
DOI:	10.1609/aaai.v33i01.33018876
Popis:	Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In conventional VQA, one may ask questions about an image which can be answered purely based on its content. For example, given an image with people in it, a typical VQA question may inquire about the number of people in the image. More recently, there is growing interest in answering questions which require commonsense knowledge involving common nouns (e.g., cats, dogs, microphones) present in the image. In spite of this progress, the important problem of answering questions requiring world knowledge about named entities (e.g., Barack Obama, White House, United Nations) in the image has not been addressed in prior research. We address this gap in this paper, and introduce KVQA – the first dataset for the task of (world) knowledge-aware VQA. KVQA consists of 183K question-answer pairs involving more than 18K named entities and 24K images. Questions in this dataset require multi-entity, multi-relation, and multi-hop reasoning over large Knowledge Graphs (KG) to arrive at an answer. To the best of our knowledge, KVQA is the largest dataset for exploring VQA over KG. Further, we also provide baseline performances using state-of-the-art methods on KVQA.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9d139721312e0886013e5d125e61590c https://doi.org/10.1609/aaai.v33i01.33018876 Zobrazit plný text záznamu