Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Abdessaied, Adnen"'
We present MST-MIXER - a novel video dialog model operating over a generic multi-modal state tracking scheme. Current models that claim to perform multi-modal state tracking fall short of two major aspects: (1) They either track only one modality (mo
Externí odkaz:
http://arxiv.org/abs/2407.02218
Recent work on dialogue-based collaborative plan acquisition (CPA) has suggested that Theory of Mind (ToM) modelling can improve missing knowledge prediction in settings with asymmetric skill-sets and knowledge. Although ToM was claimed to be importa
Externí odkaz:
http://arxiv.org/abs/2405.12621
We present the Object Language Video Transformer (OLViT) - a novel model for video dialog operating over a multi-modal attention-based dialog state tracker. Existing video dialog models struggle with questions requiring both spatial and temporal loca
Externí odkaz:
http://arxiv.org/abs/2402.13146
We propose $\mathbb{VD}$-$\mathbb{GR}$ - a novel visual dialog model that combines pre-trained language models (LMs) with graph neural networks (GNNs). Prior works mainly focused on one class of models at the expense of the other, thus missing out on
Externí odkaz:
http://arxiv.org/abs/2310.16590
We propose Neuro-Symbolic Visual Dialog (NSVD) -the first method to combine deep learning and symbolic program execution for multi-round visually-grounded reasoning. NSVD significantly outperforms existing purely-connectionist methods on two key chal
Externí odkaz:
http://arxiv.org/abs/2208.10353