All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal Transformers

Autor:	Scribano, Carmelo, Sapienza, Davide, Franchini, Giorgia, Verucchi, Micaela, Bertogna, Marko
Rok vydání:	2021
Předmět:	Computer Science - Computer Vision and Pattern Recognition
Druh dokumentu:	Working Paper
Popis:	Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://github.com/cscribano/AYCE_2021. Comment: CVPR 2021 AI CITY CHALLENGE Natural Language-Based Vehicle Retrieval
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2106.10153 Zobrazit plný text záznamu View this record from Arxiv