Recurrent Attention Network with Reinforced Generator for Visual Dialog
Autor: | Linchao Zhu, Yi Yang, Hehe Fan, Fei Wu |
---|---|
Rok vydání: | 2020 |
Předmět: |
Spatial contextual awareness
Parsing Computer Networks and Communications Computer science business.industry Deep learning 02 engineering and technology computer.software_genre Memorization Discriminative model 0803 Computer Software 0805 Distributed Computing 0806 Information Systems Hardware and Architecture 020204 information systems 0202 electrical engineering electronic engineering information engineering Reinforcement learning Artificial Intelligence & Image Processing 020201 artificial intelligence & image processing Artificial intelligence Dialog box business computer Sentence Natural language processing |
Zdroj: | ACM Transactions on Multimedia Computing, Communications, and Applications. 16:1-16 |
ISSN: | 1551-6865 1551-6857 |
DOI: | 10.1145/3390891 |
Popis: | In Visual Dialog, an agent has to parse temporal context in the dialog history and spatial context in the image to hold a meaningful dialog with humans. For example, to answer “what is the man on her left wearing?” the agent needs to (1) analyze the temporal context in the dialog history to infer who is being referred to as “her,” (2) parse the image to attend “her,” and (3) uncover the spatial context to shift the attention to “her left” and check the apparel of the man. In this article, we use a dialog network to memorize the temporal context and an attention processor to parse the spatial context. Since the question and the image are usually very complex, which makes it difficult for the question to be grounded with a single glimpse, the attention processor attends to the image multiple times to better collect visual information. In the Visual Dialog task, the generative decoder (G) is trained under the word-by-word paradigm, which suffers from the lack of sentence-level training. We propose to reinforce G at the sentence level using the discriminative model (D), which aims to select the right answer from a few candidates, to ameliorate the problem. Experimental results on the VisDial dataset demonstrate the effectiveness of our approach. |
Databáze: | OpenAIRE |
Externí odkaz: |