Zobrazeno 1 - 10
of 14 686
pro vyhledávání: '"A, Baraldi"'
Autor:
Bucciarelli, Davide, Moratelli, Nicholas, Cornia, Marcella, Baraldi, Lorenzo, Cucchiara, Rita
The task of image captioning demands an algorithm to generate natural language descriptions of visual inputs. Recent advancements have seen a convergence between image captioning research and the development of Large Language Models (LLMs) and Multim
Externí odkaz:
http://arxiv.org/abs/2412.03665
Autor:
Barsellotti, Luca, Bianchi, Lorenzo, Messina, Nicola, Carrara, Fabio, Cornia, Marcella, Baraldi, Lorenzo, Falchi, Fabrizio, Cucchiara, Rita
Open-Vocabulary Segmentation (OVS) aims at segmenting images from free-form textual concepts without predefined training classes. While existing vision-language models such as CLIP can generate segmentation masks by leveraging coarse spatial informat
Externí odkaz:
http://arxiv.org/abs/2411.19331
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Multimodal LLMs (MLLMs) are the natural extension of large language models to handle multimodal inputs, combining text and image data. They have recently garnered attention due to their capability to address complex tasks involving both modalities. H
Externí odkaz:
http://arxiv.org/abs/2411.16863
In the last years, the research interest in visual navigation towards objects in indoor environments has grown significantly. This growth can be attributed to the recent availability of large navigation datasets in photo-realistic simulated environme
Externí odkaz:
http://arxiv.org/abs/2410.18195
Autor:
Baraldi, Robert, Manns, Paul
Total variation integer optimal control problems admit solutions and necessary optimality conditions via geometric variational analysis. In spite of the existence of said solutions, algorithms which solve the discretized objective suffer from high nu
Externí odkaz:
http://arxiv.org/abs/2410.15672
Despite significant advancements in caption generation, existing evaluation metrics often fail to capture the full quality or fine-grained details of captions. This is mainly due to their reliance on non-specific human-written references or noisy pre
Externí odkaz:
http://arxiv.org/abs/2410.07336
Autor:
Samadi, Nazanin, Seiboth, Frank, Dias, Carlos Sato Baraldi, Novikov, Dmitri, Spiers, Kathryn, Shi, Xianbo
This work presents the design, fabrication, and experimental validation of a refractive diamond axicon for X-ray beam shaping. The diamond axicon was developed to overcome the limitations of polymer-based axicons particularly for application in Trans
Externí odkaz:
http://arxiv.org/abs/2410.01327
Diffusion models have significantly advanced generative AI, but they encounter difficulties when generating complex combinations of multiple objects. As the final result heavily depends on the initial seed, accurately ensuring the desired output can
Externí odkaz:
http://arxiv.org/abs/2409.10597
Autor:
Bartalucci, I., Rossetti, M., Boschin, W., Girardi, M., Nonino, M., Baraldi, E., Balboni, M., Coe, D., De Grandi, S., Gastaldello, F., Ghizzardi, S., Giacintucci, S., Grillo, C., Harvey, D., Lovisari, L., Molendi, S., Resseguier, T., Riva, G., Venturi, T., Zitrin, A.
We present a detailed study of the gas and galaxy properties of the cluster PSZ2 G282.28+49.94 detected in the Planck all-sky survey. The intracluster medium (ICM) of this object at z=0.56 exhibits a cometary-like shape. Combining Chandra and TNG obs
Externí odkaz:
http://arxiv.org/abs/2409.07290
Fine-tuning image captioning models with hand-crafted rewards like the CIDEr metric has been a classical strategy for promoting caption quality at the sequence level. This approach, however, is known to limit descriptiveness and semantic richness and
Externí odkaz:
http://arxiv.org/abs/2408.16827