Contrastive Learning for Weakly Supervised Phrase Grounding
Autor: | Derek Hoiem, Tanmay Gupta, Jan Kautz, Gal Chechik, Arash Vahdat, Xiaodong Yang |
---|---|
Rok vydání: | 2020 |
Předmět: |
Phrase
Computer science business.industry Image (category theory) 05 social sciences Mutual information Construct (python library) 010501 environmental sciences computer.software_genre 01 natural sciences Upper and lower bounds 0502 economics and business Code (cryptography) Artificial intelligence Language model 050207 economics business computer Word (computer architecture) Natural language processing 0105 earth and related environmental sciences |
Zdroj: | Computer Vision – ECCV 2020 ISBN: 9783030585792 ECCV (3) |
Popis: | Phrase grounding, the problem of associating image regions to caption words, is a crucial component of vision-language tasks. We show that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words. Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions. A key idea is to construct effective negative captions for learning through language model guided word substitutions. Training with our negatives yields a \(\sim 10\%\) absolute gain in accuracy over randomly-sampled negatives from the training data. Our weakly supervised phrase grounding model trained on COCO-Captions shows a healthy gain of \(5.7\%\) to achieve \(76.7\%\) accuracy on Flickr30K Entities benchmark. Our code and project material will be available at http://tanmaygupta.info/info-ground. |
Databáze: | OpenAIRE |
Externí odkaz: |