End-to-End Recurrent Cross-Modality Attention for Video Dialogue

Autor: Yun-Wei Chu, Lun-Wei Ku, Kuan-Yen Lin, Chao-Chun Hsu
Rok vydání: 2021
Předmět:
Zdroj: IEEE/ACM Transactions on Audio, Speech, and Language Processing. 29:2456-2464
ISSN: 2329-9304
2329-9290
Popis: Visual dialogue systems need to understand dynamic visual scenes and comprehend semantics in order to converse with users. Constructing video dialogue systems is more challenging than traditional image dialogue systems because the large feature space of videos makes it difficult to capture semantic information. Furthermore, the dialogue system also needs to precisely answer users’ question based on comprehensive understanding of the videos and the previous dialogue. In order to improve the performance of video dialogue system, we proposed an end-to-end recurrent cross-modality attention (ReCMA) model to answer a series of questions about a video from both visual and textual modality. The answer representation of the question is updated based on both visual representation and textual representation in each step of the reasoning process to have a better understanding of both modalities’ information. We evaluate our method on the challenging DSTC7 video scene-aware dialog dataset and the proposed ReCMA achieves a relative 20.8% improvement over the baseline on CIDEr.
Databáze: OpenAIRE