Výsledky vyhledávání

Report

Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation

Autor: Mistretta, Marco, Baldrati, Alberto, Bertini, Marco, Bagdanov, Andrew D.

Vision-Language Models (VLMs) demonstrate remarkable zero-shot generalization to unseen tasks, but fall short of the performance of supervised methods in generalizing to downstream tasks with limited data. Prompt learning is emerging as a parameter-e

Externí odkaz: http://arxiv.org/abs/2407.03056

Zobrazit plný text záznamu

Report

iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image Retrieval

Autor: Agnolucci, Lorenzo, Baldrati, Alberto, Bertini, Marco, Del Bimbo, Alberto

Given a query consisting of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images visually similar to the reference one while incorporating the changes specified in the relative caption. The reliance

Externí odkaz: http://arxiv.org/abs/2405.02951

Zobrazit plný text záznamu

Report

Multimodal-Conditioned Latent Diffusion Models for Fashion Image Editing

Autor: Baldrati, Alberto, Morelli, Davide, Cornia, Marcella, Bertini, Marco, Cucchiara, Rita

Fashion illustration is a crucial medium for designers to convey their creative vision and transform design concepts into tangible representations that showcase the interplay between clothing and the human body. In the context of fashion design, comp

Externí odkaz: http://arxiv.org/abs/2403.14828

Zobrazit plný text záznamu

Report

Mapping Memes to Words for Multimodal Hateful Meme Classification

Autor: Burbi, Giovanni, Baldrati, Alberto, Agnolucci, Lorenzo, Bertini, Marco, Del Bimbo, Alberto

Multimodal image-text memes are prevalent on the internet, serving as a unique form of communication that combines visual and textual elements to convey humor, ideas, or emotions. However, some memes take a malicious turn, promoting hateful content a

Externí odkaz: http://arxiv.org/abs/2310.08368

Zobrazit plný text záznamu

Report

Exploiting CLIP-based Multi-modal Approach for Artwork Classification and Retrieval

Autor: Baldrati, Alberto, Bertini, Marco, Uricchio, Tiberio, Del Bimbo, Alberto

Given the recent advances in multimodal image pretraining where visual models trained with semantically dense textual supervision tend to have better generalization capabilities than those trained using categorical attributes or through unsupervised

Externí odkaz: http://arxiv.org/abs/2309.12110

Zobrazit plný text záznamu

Report

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Autor: Cartella, Giuseppe, Baldrati, Alberto, Morelli, Davide, Cornia, Marcella, Bertini, Marco, Cucchiara, Rita

The inexorable growth of online shopping and e-commerce demands scalable and robust machine learning-based solutions to accommodate customer requirements. In the context of automatic tagging classification and multimodal retrieval, prior works either

Externí odkaz: http://arxiv.org/abs/2309.05551

Zobrazit plný text záznamu

Report

Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

Autor: Baldrati, Alberto, Bertini, Marco, Uricchio, Tiberio, del Bimbo, Alberto

Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one that integrates the modifications expressed by the caption. Given that recent research h

Externí odkaz: http://arxiv.org/abs/2308.11485

Zobrazit plný text záznamu

Report

ECO: Ensembling Context Optimization for Vision-Language Models

Autor: Agnolucci, Lorenzo, Baldrati, Alberto, Todino, Francesco, Becattini, Federico, Bertini, Marco, Del Bimbo, Alberto

Image recognition has recently witnessed a paradigm shift, where vision-language models are now used to perform few-shot classification based on textual prompts. Among these, the CLIP model has shown remarkable capabilities for zero-shot transfer by

Externí odkaz: http://arxiv.org/abs/2307.14063

Zobrazit plný text záznamu

Report

LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On

Autor: Morelli, Davide, Baldrati, Alberto, Cartella, Giuseppe, Cornia, Marcella, Bertini, Marco, Cucchiara, Rita

The rapidly evolving fields of e-commerce and metaverse continue to seek innovative approaches to enhance the consumer experience. At the same time, recent advancements in the development of diffusion models have enabled generative networks to create

Externí odkaz: http://arxiv.org/abs/2305.13501

Zobrazit plný text záznamu

Report

Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing

Autor: Baldrati, Alberto, Morelli, Davide, Cartella, Giuseppe, Cornia, Marcella, Bertini, Marco, Cucchiara, Rita

Fashion illustration is used by designers to communicate their vision and to bring the design idea from conceptualization to realization, showing how clothes interact with the human body. In this context, computer vision can thus be used to improve t

Externí odkaz: http://arxiv.org/abs/2304.02051

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání