Výsledky vyhledávání

Report

Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis

Autor: Bucciarelli, Davide, Moratelli, Nicholas, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita

The task of image captioning demands an algorithm to generate natural language descriptions of visual inputs. Recent advancements have seen a convergence between image captioning research and the development of Large Language Models (LLMs) and Multim

Externí odkaz: http://arxiv.org/abs/2412.03665

Zobrazit plný text záznamu

Report

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

Autor: Barsellotti, Luca, Bianchi, Lorenzo, Messina, Nicola, Carrara, Fabio, Cornia, Marcella, Baraldi, Lorenzo, Falchi, Fabrizio, Cucchiara, Rita

Open-Vocabulary Segmentation (OVS) aims at segmenting images from free-form textual concepts without predefined training classes. While existing vision-language models such as CLIP can generate segmentation masks by leveraging coarse spatial informat

Externí odkaz: http://arxiv.org/abs/2411.19331

Zobrazit plný text záznamu

Report

Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Autor: Cocchi, Federico, Moratelli, Nicholas, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita

Multimodal LLMs (MLLMs) are the natural extension of large language models to handle multimodal inputs, combining text and image data. They have recently garnered attention due to their capability to address complex tasks involving both modalities. H

Externí odkaz: http://arxiv.org/abs/2411.16863

Zobrazit plný text záznamu

Report

Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments

Autor: Barsellotti, Luca, Bigazzi, Roberto, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita

In the last years, the research interest in visual navigation towards objects in indoor environments has grown significantly. This growth can be attributed to the recent availability of large navigation datasets in photo-realistic simulated environme

Externí odkaz: http://arxiv.org/abs/2410.18195

Zobrazit plný text záznamu

Report

Domain decomposition for integer optimal control with total variation regularization

Autor: Baraldi, Robert, Manns, Paul

Total variation integer optimal control problems admit solutions and necessary optimality conditions via geometric variational analysis. In spite of the existence of said solutions, algorithms which solve the discretized objective suffer from high nu

Externí odkaz: http://arxiv.org/abs/2410.15672

Zobrazit plný text záznamu

Report

Positive-Augmented Contrastive Learning for Vision-and-Language Evaluation and Training

Autor: Sarto, Sara, Moratelli, Nicholas, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita

Despite significant advancements in caption generation, existing evaluation metrics often fail to capture the full quality or fine-grained details of captions. This is mainly due to their reliance on non-specific human-written references or noisy pre

Externí odkaz: http://arxiv.org/abs/2410.07336

Zobrazit plný text záznamu

Report

Design, fabrication, and testing of diamond axicons for X-ray microscopy applications

Autor: Samadi, Nazanin, Seiboth, Frank, Dias, Carlos Sato Baraldi, Novikov, Dmitri, Spiers, Kathryn, Shi, Xianbo

This work presents the design, fabrication, and experimental validation of a refractive diamond axicon for X-ray beam shaping. The diamond axicon was developed to overcome the limitations of polymer-based axicons particularly for application in Trans

Externí odkaz: http://arxiv.org/abs/2410.01327

Zobrazit plný text záznamu

Report

Optimizing Resource Consumption in Diffusion Models through Hallucination Early Detection

Autor: Betti, Federico, Baraldi, Lorenzo, Cucchiara, Rita, Sebe, Nicu

Diffusion models have significantly advanced generative AI, but they encounter difficulties when generating complex combinations of multiple objects. As the final result heavily depends on the initial seed, accurately ensuring the desired output can

Externí odkaz: http://arxiv.org/abs/2409.10597

Zobrazit plný text záznamu

Report

PSZ2 G282.28+49.94, a recently discovered analogue of the famous Bullet Cluster

We present a detailed study of the gas and galaxy properties of the cluster PSZ2 G282.28+49.94 detected in the Planck all-sky survey. The intracluster medium (ICM) of this object at z=0.56 exhibits a cometary-like shape. Combining Chandra and TNG obs

Externí odkaz: http://arxiv.org/abs/2409.07290

Zobrazit plný text záznamu

Report

Fluent and Accurate Image Captioning with a Self-Trained Reward Model

Autor: Moratelli, Nicholas, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita

Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality at the sequence level. This approach, however, is known to limit descriptiveness and semantic richness and

Externí odkaz: http://arxiv.org/abs/2408.16827

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání