Depth-Guided AdaIN and Shift Attention Network for Vision-And-Language Navigation

Autor:	Qiang Sun, Yanwei Fu, Xiangyang Xue, Yifeng Zhuang, Zhengqing Chen
Rok vydání:	2021
Předmět:	Visual language Computer science Human–computer interaction Existential quantification Normalization (image processing) Benchmark (computing) RGB color model Task (project management) Term (time) Domain (software engineering)
Zdroj:	2021 IEEE International Conference on Multimedia and Expo (ICME).
DOI:	10.1109/icme51207.2021.9428422
Popis:	Visual Language Navigation (VLN) is the grand goal of AI, which enables the agent to act by the language instructions from humans. In VLN task, the agent learns to search for a specific region described by the instructions in the training environments, and performs the navigation in the unseen environments. Normally, there exists a large domain gap be-tween the seen and unseen environments. Numerous works have been put on data augmentation and designing new loss in such a multi-task navigation setting. However, as a spatial and temporal searching task, a valuable signal source for the navigation – depth has not yet fully explored and thus been ignored in previous efforts. Typically, the current models lack the ability to capture the relative spatial directions to the grounding view. To address these issues, we propose an environment adaptive method based on a Depth-guided Adaptive Instance Normalization (DG-AdaIN) module to adjust the RGB features in term of the depth features, and develop a shift attention module to model the relative direct information in the attention map. Extensive experiments have validated the efficacy of our method on the benchmark dataset.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::b4a3e95a8dacead3c5a21478ef49fb9c https://doi.org/10.1109/icme51207.2021.9428422 Zobrazit plný text záznamu