VQA Model Based on Image Descriptive Paragraph and Deep Integration of BERT
Autor: | Jianing Zhang, Huajie Zhang, Zhaochang Wu, Yunfang Chen |
---|---|
Rok vydání: | 2020 |
Předmět: | |
Zdroj: | Journal of Physics: Conference Series. 1624:022014 |
ISSN: | 1742-6596 1742-6588 |
DOI: | 10.1088/1742-6596/1624/2/022014 |
Popis: | Visual Question Answering (VQA) is a fast developing field involving multiple disciplines, and it is constantly challenging more complex tasks. The classic combination of CNN+LSTM can effectively extract images and language representation to complete the VQA task, but there are still many problems, such as excessively long sequence processing, etc. In recent years, BERT model has expanded rapidly from the field of natural language processing to a broader multi-modal field with its strong learning ability. In this paper, we propose a novel way to apply BERT model in the VQA field. We use the descriptive paragraph generation technology to transform the picture into a text paragraph description, and integrate question information and image information on BERT model. Our model achieves an excellent performance on the VQA2.0 dataset with an overall accuracy 5% higher than previous models. |
Databáze: | OpenAIRE |
Externí odkaz: |