Attention on Attention: Architectures for Visual Question Answering (VQA)

Autor:	Singh, Jasdeep, Ying, Vincent, Nutkiewicz, Alex
Rok vydání:	2018
Předmět:	Computer Science - Computation and Language Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition 68Txx
Druh dokumentu:	Working Paper
Popis:	Visual Question Answering (VQA) is an increasingly popular topic in deep learning research, requiring coordination of natural language processing and computer vision modules into a single architecture. We build upon the model which placed first in the VQA Challenge by developing thirteen new attention mechanisms and introducing a simplified classifier. We performed 300 GPU hours of extensive hyperparameter and architecture searches and were able to achieve an evaluation score of 64.78%, outperforming the existing state-of-the-art single model's validation score of 63.15%. Comment: Visual Question Answering Project
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/1803.07724 Zobrazit plný text záznamu View this record from Arxiv