VQA Model Based on Image Descriptive Paragraph and Deep Integration of BERT

Autor:	Jianing Zhang, Huajie Zhang, Zhaochang Wu, Yunfang Chen
Rok vydání:	2020
Předmět:	History business.industry Computer science Deep integration Artificial intelligence Paragraph computer.software_genre business computer Natural language processing Computer Science Applications Education Image (mathematics)
Zdroj:	Journal of Physics: Conference Series. 1624:022014
ISSN:	1742-6596 1742-6588
DOI:	10.1088/1742-6596/1624/2/022014
Popis:	Visual Question Answering (VQA) is a fast developing field involving multiple disciplines, and it is constantly challenging more complex tasks. The classic combination of CNN+LSTM can effectively extract images and language representation to complete the VQA task, but there are still many problems, such as excessively long sequence processing, etc. In recent years, BERT model has expanded rapidly from the field of natural language processing to a broader multi-modal field with its strong learning ability. In this paper, we propose a novel way to apply BERT model in the VQA field. We use the descriptive paragraph generation technology to transform the picture into a text paragraph description, and integrate question information and image information on BERT model. Our model achieves an excellent performance on the VQA2.0 dataset with an overall accuracy 5% higher than previous models.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::1b030ab57eab9b0fd9d4e2bf79aa5ebe https://doi.org/10.1088/1742-6596/1624/2/022014 Zobrazit plný text záznamu