VQA Model Based on Image Descriptive Paragraph and Deep Integration of BERT

Autor: Jianing Zhang, Huajie Zhang, Zhaochang Wu, Yunfang Chen
Rok vydání: 2020
Předmět:
Zdroj: Journal of Physics: Conference Series. 1624:022014
ISSN: 1742-6596
1742-6588
DOI: 10.1088/1742-6596/1624/2/022014
Popis: Visual Question Answering (VQA) is a fast developing field involving multiple disciplines, and it is constantly challenging more complex tasks. The classic combination of CNN+LSTM can effectively extract images and language representation to complete the VQA task, but there are still many problems, such as excessively long sequence processing, etc. In recent years, BERT model has expanded rapidly from the field of natural language processing to a broader multi-modal field with its strong learning ability. In this paper, we propose a novel way to apply BERT model in the VQA field. We use the descriptive paragraph generation technology to transform the picture into a text paragraph description, and integrate question information and image information on BERT model. Our model achieves an excellent performance on the VQA2.0 dataset with an overall accuracy 5% higher than previous models.
Databáze: OpenAIRE