Language-Agnostic Visual-Semantic Embeddings

Autor:	Jonatas Wehrmann, Rodrigo C. Barros
Rok vydání:	2021
Zdroj:	Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021).
DOI:	10.5753/ctd.2021.15751
Popis:	We propose a framework for training language-invariant cross-modal retrieval models. We introduce four novel text encoding approaches, as well as a character-based word-embedding approach, allowing the model to project similar words across languages into the same word-embedding space. In addition, by performing cross-modal retrieval at the character level, the storage requirements for a text encoder decrease substantially, allowing for lighter and more scalable retrieval architectures. The proposed language-invariant textual encoder based on characters is virtually unaffected in terms of storage requirements when novel languages are added to the system. Contributions include new methods for building character-level-based word-embeddings, an improved loss function, and a novel cross-language alignment module that not only makes the architecture language-invariant, but also presents better predictive performance. Moreover, we introduce a module called \adapt, which is responsible for providing query-aware visual representations that generate large improvements in terms of recall for four widely-used large-scale image-text datasets. We show that our models outperform the current state-of-the-art all scenarios. This thesis can serve as a new path on retrieval research, now allowing for the effective use of captions in multiple-language scenarios.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::11dd33cde278f4f134a90db1f1ee7e5d https://doi.org/10.5753/ctd.2021.15751 Zobrazit plný text záznamu