Visual Question Answering Based on Question Attention Model
Autor: | Zhaochang Wu, Huajie Zhang, Yunfang Chen, Jianing Zhang |
---|---|
Rok vydání: | 2020 |
Předmět: | |
Zdroj: | Journal of Physics: Conference Series. 1624:022022 |
ISSN: | 1742-6596 1742-6588 |
Popis: | Visual Question Answer (VQA), the natural language question of Visual images, has become popular in the field of artificial intelligence. At present, most of the VQA models extract the whole image features, which consume a large amount of computation and have a complex structure. In this paper, we propose a VQA method based on question attention model. Firstly, the Convolutional Neural Networks (CNN) is used to extract image features from the input images, and the question text is processed by the Long Short-Term Memory (LSTM). Then, we design a question attention module to let the learning algorithm focus on the most relevant features of the input text. According to question features, our method utilities the attention module to add the corresponding weights to the image features and extract the meaningful information for the generation of answer sequence words. Our method performed significantly better than the LSTMQ+I model on the MS COCO visual question answer (VQA) dataset with an accuracy improvement of 2%. |
Databáze: | OpenAIRE |
Externí odkaz: |