Learning Unsupervised Visual Grounding Through Semantic Self-Supervision
Autor: | Shreyas Saxena, Vineet Gandhi, Syed Ashar Javed |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
0209 industrial biotechnology Modalities Computer science business.industry Property (programming) Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition 02 engineering and technology Semantic property computer.software_genre Task (project management) 020901 industrial engineering & automation Concept learning 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence State (computer science) business Set (psychology) computer Natural language Natural language processing |
Zdroj: | IJCAI |
Popis: | Localizing natural language phrases in images is a challenging problem that requires joint understanding of both the textual and visual modalities. In the unsupervised setting, lack of supervisory signals exacerbate this difficulty. In this paper, we propose a novel framework for unsupervised visual grounding which uses concept learning as a proxy task to obtain self-supervision. The simple intuition behind this idea is to encourage the model to localize to regions which can explain some semantic property in the data, in our case, the property being the presence of a concept in a set of images. We present thorough quantitative and qualitative experiments to demonstrate the efficacy of our approach and show a 5.6% improvement over the current state of the art on Visual Genome dataset, a 5.8% improvement on the ReferItGame dataset and comparable to state-of-art performance on the Flickr30k dataset. NIPS Workshop 2018 |
Databáze: | OpenAIRE |
Externí odkaz: |