Zobrazeno 1 - 10
of 415
pro vyhledávání: '"Ricci, Elisa"'
Gaze target detection aims at determining the image location where a person is looking. While existing studies have made significant progress in this area by regressing accurate gaze heatmaps, these achievements have largely relied on access to exten
Externí odkaz:
http://arxiv.org/abs/2409.18561
Autor:
Tur, Anil Osman, Conti, Alessandro, Beyan, Cigdem, Boscaini, Davide, Larcher, Roberto, Messelodi, Stefano, Poiesi, Fabio, Ricci, Elisa
In smart retail applications, the large number of products and their frequent turnover necessitate reliable zero-shot object classification methods. The zero-shot assumption is essential to avoid the need for re-training the classifier every time a n
Externí odkaz:
http://arxiv.org/abs/2409.14963
Autor:
Bosetti, Massimo, Zhang, Shibingfeng, Liberatori, Benedetta, Zara, Giacomo, Ricci, Elisa, Rota, Paolo
Vision-language models (VLMs) have demonstrated remarkable performance across various visual tasks, leveraging joint learning of visual and textual representations. While these models excel in zero-shot image tasks, their application to zero-shot vid
Externí odkaz:
http://arxiv.org/abs/2408.16412
Vision-Language Models (VLMs) combine visual and textual understanding, rendering them well-suited for diverse tasks like generating image captions and answering visual questions across various domains. However, these capabilities are built upon trai
Externí odkaz:
http://arxiv.org/abs/2408.01228
The rapid advancement of generative models has significantly enhanced the realism and customization of digital content creation. The increasing power of these tools, coupled with their ease of access, fuels the creation of photorealistic fake content
Externí odkaz:
http://arxiv.org/abs/2407.21554
Machine unlearning (MU) aims to erase data from a model as if it never saw them during training. To this extent, existing MU approaches assume complete or partial access to the training data, which can be limited over time due to privacy regulations.
Externí odkaz:
http://arxiv.org/abs/2407.12069
Efficient finetuning of vision-language models (VLMs) like CLIP for specific downstream tasks is gaining significant attention. Previous works primarily focus on prompt learning to adapt the CLIP into a variety of downstream tasks, however, suffering
Externí odkaz:
http://arxiv.org/abs/2407.08374
Autor:
Conti, Alessandro, Fini, Enrico, Rota, Paolo, Wang, Yiming, Mancini, Massimiliano, Ricci, Elisa
Assessing the capabilities of large multimodal models (LMMs) often requires the creation of ad-hoc evaluations. Currently, building new benchmarks requires tremendous amounts of manual work for each specific analysis. This makes the evaluation proces
Externí odkaz:
http://arxiv.org/abs/2406.12321
Vision-Language Models seamlessly discriminate among arbitrary semantic categories, yet they still suffer from poor generalization when presented with challenging examples. For this reason, Episodic Test-Time Adaptation (TTA) strategies have recently
Externí odkaz:
http://arxiv.org/abs/2405.18330
Prompt tuning has emerged as an effective rehearsal-free technique for class-incremental learning (CIL) that learns a tiny set of task-specific parameters (or prompts) to instruct a pre-trained transformer to learn on a sequence of tasks. Albeit effe
Externí odkaz:
http://arxiv.org/abs/2405.15633