Action knowledge for video captioning with graph neural networks

Autor:	Willy Fitra Hendria, Vania Velda, Bahy Helmi Hartoyo Putra, Fikriansyah Adzaka, Cheol Jeong
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	Graph neural networks Video captioning Visual representation Electronic computers. Computer science QA75.5-76.95
Zdroj:	Journal of King Saud University: Computer and Information Sciences, Vol 35, Iss 4, Pp 50-62 (2023)
Druh dokumentu:	article
ISSN:	1319-1578
DOI:	10.1016/j.jksuci.2023.03.006
Popis:	Many existing video captioning methods capture action information in the video by exploiting features extracted from an action recognition model. However, directly using the action features without object-specific representation may not well capture the object interactions. Consequently, the generated captions may not be accurate enough in describing the action and the object in the scenes. To address this issue, we propose to incorporate the action features as the edge features in a graph neural network where the nodes represent objects, thereby capturing a finer visual representation of object-action-object relationships. Previous graph-based video captioning methods commonly relied on a pretrained object detection model to create the node representations. The object detection model, however, may miss detecting some important objects. To alleviate this problem, we further introduce a grid-based node representation where the nodes are represented by the features extracted from grids of video frames. Using this representation, the important objects in the scenes are captured more thoroughly. To avoid adding any complexity during inference, the knowledge of the proposed graph is transferred to another neural network via knowledge distillation. Our proposed method achieved state-of-the-art results on two popular video captioning datasets, i.e., MSVD and MSR-VTT, on all metrics. The code of our proposed method is available at https://github.com/Sejong-VLI/V2T-Action-Graph-JKSUCIS-2023.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/4439d9b584ad4ea5940dcfa5449c6003 Zobrazit plný text záznamu View record in DOAJ