Full-Network Embedding in a Multimodal Embedding Pipeline
Autor: | Vilalta Arias, Armand, Garcia Gasulla, Dario|||0000-0001-6732-5641, Parés Pont, Ferran, Moreno Vázquez, Jonatan, Ayguadé Parra, Eduard|||0000-0002-5146-103X, Labarta Mancho, Jesús José|||0000-0002-7489-4727, Cortés García, Claudio Ulises|||0000-0003-0192-3096, Suzumura, Toyotaro |
---|---|
Přispěvatelé: | Universitat Politècnica de Catalunya. Doctorat en Intel·ligència Artificial, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament de Ciències de la Computació, Barcelona Supercomputing Center, Universitat Politècnica de Catalunya. KEMLG - Grup d'Enginyeria del Coneixement i Aprenentatge Automàtic, Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
FOS: Computer and information sciences
Imatges -- Anàlisi Artificial intelligence Informàtica::Intel·ligència artificial::Aprenentatge automàtic [Àrees temàtiques de la UPC] Computer Science - Computation and Language Computer Vision and Pattern Recognition (cs.CV) Image annotation Computer Science - Computer Vision and Pattern Recognition Computer Science - Neural and Evolutionary Computing Deep learning Image analysis Transfer learning Semantic deep learning Machine learning Aprenentatge automàtic Image data mining Neural and Evolutionary Computing (cs.NE) Image retrieval Computation and Language (cs.CL) Multimodal embedding |
Zdroj: | UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) |
Popis: | The current state-of-the-art for image annotation and image retrieval tasks is obtained through deep neural networks, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding in this setting, replacing the original image representation in a competitive multimodal embedding generation scheme. Unlike the one-layer image embeddings typically used by most approaches, the Full-Network embedding provides a multi-scale representation of images, which results in richer characterizations. To measure the influence of the Full-Network embedding, we evaluate its performance on three different datasets, and compare the results with the original multimodal embedding generation scheme when using a one-layer image embedding, and with the rest of the state-of-the-art. Results for image annotation and image retrieval tasks indicate that the Full-Network embedding is consistently superior to the one-layer embedding. These results motivate the integration of the Full-Network embedding on any multimodal embedding generation scheme, something feasible thanks to the flexibility of the approach. In 2nd Workshop on Semantic Deep Learning (SemDeep-2) at the 12th International Conference on Computational Semantics (IWCS) 2017 |
Databáze: | OpenAIRE |
Externí odkaz: |