Visual Question Answering Based on Question Attention Model

Autor: Zhaochang Wu, Huajie Zhang, Yunfang Chen, Jianing Zhang
Rok vydání: 2020
Předmět:
Zdroj: Journal of Physics: Conference Series. 1624:022022
ISSN: 1742-6596
1742-6588
Popis: Visual Question Answer (VQA), the natural language question of Visual images, has become popular in the field of artificial intelligence. At present, most of the VQA models extract the whole image features, which consume a large amount of computation and have a complex structure. In this paper, we propose a VQA method based on question attention model. Firstly, the Convolutional Neural Networks (CNN) is used to extract image features from the input images, and the question text is processed by the Long Short-Term Memory (LSTM). Then, we design a question attention module to let the learning algorithm focus on the most relevant features of the input text. According to question features, our method utilities the attention module to add the corresponding weights to the image features and extract the meaningful information for the generation of answer sequence words. Our method performed significantly better than the LSTMQ+I model on the MS COCO visual question answer (VQA) dataset with an accuracy improvement of 2%.
Databáze: OpenAIRE