Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Miranda, Imanol"'
Existing Vision-Language Compositionality (VLC) benchmarks like SugarCrepe are formulated as image-to-text retrieval problems, where, given an image, the models need to select between the correct textual description and a synthetic hard negative text
Externí odkaz:
http://arxiv.org/abs/2406.09952