KnowER: Knowledge enhancement for efficient text-video retrieval

Autor:	Hongwei Kou, Yingyun Yang, Yan Hua
Jazyk:	angličtina
Rok vydání:	2023
Předmět:	text-video retrieval knowledge graph contrastive language-image pre-training (clip) Telecommunication TK5101-6720
Zdroj:	Intelligent and Converged Networks, Vol 4, Iss 2, Pp 93-105 (2023)
Druh dokumentu:	article
ISSN:	2708-6240
DOI:	10.23919/ICN.2023.0009
Popis:	The widespread adoption of mobile Internet and the Internet of things (IoT) has led to a significant increase in the amount of video data. While video data are increasingly important, language and text remain the primary methods of interaction in everyday communication, text-based cross-modal retrieval has become a crucial demand in many applications. Most previous text-video retrieval works utilize implicit knowledge of pre-trained models such as contrastive language-image pre-training (CLIP) to boost retrieval performance. However, implicit knowledge only records the co-occurrence relationship existing in the data, and it cannot assist the model to understand specific words or scenes. Another type of out-of-domain knowledge—explicit knowledge—which is usually in the form of a knowledge graph, can play an auxiliary role in understanding the content of different modalities. Therefore, we study the application of external knowledge base in text-video retrieval model for the first time, and propose KnowER, a model based on knowledge enhancement for efficient text-video retrieval. The knowledge-enhanced model achieves state-of-the-art performance on three widely used text-video retrieval datasets, i.e., MSRVTT, DiDeMo, and MSVD.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/636cfb67015a42df93f8f68225b02531 Zobrazit plný text záznamu View record in DOAJ