Learning Words by Drawing Images

Autor:	Dídac Surís, Antonio Torralba, James Glass, David Bau, David Harwath, Adrià Recasens
Rok vydání:	2019
Předmět:	Computer science business.industry 02 engineering and technology Visual reasoning 010501 environmental sciences computer.software_genre 01 natural sciences Image (mathematics) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer Curriculum Natural language processing 0105 earth and related environmental sciences
Zdroj:	CVPR
DOI:	10.1109/cvpr.2019.00213
Popis:	We propose a framework for learning through drawing. Our goal is to learn the correspondence between spoken words and abstract visual attributes, from a dataset of spoken descriptions of images. Building upon recent findings that GAN representations can be manipulated to edit semantic concepts in the generated output, we propose a new method to use such GAN-generated images to train a model using a triplet loss. To apply the method, we develop Audio CLEVRGAN, a new dataset of audio descriptions of GAN-generated CLEVR images, and we describe a training procedure that creates a curriculum of GAN-generated images that focuses training on image pairs that differ in a specific, informative way. Training is done without additional supervision beyond the spoken captions and the GAN. We find that training that takes advantage of GAN-generated edited examples results in improvements in the model's ability to learn attributes compared to previous results. Our proposed learning framework also results in models that can associate spoken words with some abstract visual concepts such as color and size.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::3d2db94ff352d9df672c67f4601516d5 https://doi.org/10.1109/cvpr.2019.00213