Language-Agnostic Visual-Semantic Embeddings

Autor: Jonatas Wehrmann, Rodrigo C. Barros
Rok vydání: 2021
Zdroj: Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021).
DOI: 10.5753/ctd.2021.15751
Popis: We propose a framework for training language-invariant cross-modal retrieval models. We introduce four novel text encoding approaches, as well as a character-based word-embedding approach, allowing the model to project similar words across languages into the same word-embedding space. In addition, by performing cross-modal retrieval at the character level, the storage requirements for a text encoder decrease substantially, allowing for lighter and more scalable retrieval architectures. The proposed language-invariant textual encoder based on characters is virtually unaffected in terms of storage requirements when novel languages are added to the system. Contributions include new methods for building character-level-based word-embeddings, an improved loss function, and a novel cross-language alignment module that not only makes the architecture language-invariant, but also presents better predictive performance. Moreover, we introduce a module called \adapt, which is responsible for providing query-aware visual representations that generate large improvements in terms of recall for four widely-used large-scale image-text datasets. We show that our models outperform the current state-of-the-art all scenarios. This thesis can serve as a new path on retrieval research, now allowing for the effective use of captions in multiple-language scenarios.
Databáze: OpenAIRE