KnowER: Knowledge enhancement for efficient text-video retrieval

Autor: Hongwei Kou, Yingyun Yang, Yan Hua
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: Intelligent and Converged Networks, Vol 4, Iss 2, Pp 93-105 (2023)
Druh dokumentu: article
ISSN: 2708-6240
DOI: 10.23919/ICN.2023.0009
Popis: The widespread adoption of mobile Internet and the Internet of things (IoT) has led to a significant increase in the amount of video data. While video data are increasingly important, language and text remain the primary methods of interaction in everyday communication, text-based cross-modal retrieval has become a crucial demand in many applications. Most previous text-video retrieval works utilize implicit knowledge of pre-trained models such as contrastive language-image pre-training (CLIP) to boost retrieval performance. However, implicit knowledge only records the co-occurrence relationship existing in the data, and it cannot assist the model to understand specific words or scenes. Another type of out-of-domain knowledge—explicit knowledge—which is usually in the form of a knowledge graph, can play an auxiliary role in understanding the content of different modalities. Therefore, we study the application of external knowledge base in text-video retrieval model for the first time, and propose KnowER, a model based on knowledge enhancement for efficient text-video retrieval. The knowledge-enhanced model achieves state-of-the-art performance on three widely used text-video retrieval datasets, i.e., MSRVTT, DiDeMo, and MSVD.
Databáze: Directory of Open Access Journals