Výsledky vyhledávání - "Kusumba, Abhiram"

Report

TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives

Autor: Patel, Maitreya, Kusumba, Abhiram, Cheng, Sheng, Kim, Changhoon, Gokhale, Tejas, Baral, Chitta, Yang, Yezhou

Contrastive Language-Image Pretraining (CLIP) models maximize the mutual information between text and visual modalities to learn representations. This makes the nature of the training data a significant factor in the efficacy of CLIP for downstream t

Externí odkaz: http://arxiv.org/abs/2411.02545

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání