GuessWhat?! Visual object discovery through multi-modal dialogue

Autor:	Olivier Pietquin, Aaron Courville, Sarath Chandar, Harm de Vries, Hugo Larochelle, Florian Strub
Přispěvatelé:	Université de Montréal (UdeM), Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Twitter, Projet IGLU, Twitter Inc
Jazyk:	angličtina
Rok vydání:	2017
Předmět:	FOS: Computer and information sciences Computer science Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition 02 engineering and technology 010501 environmental sciences [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] 01 natural sciences Oracle Task (project management) [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] Deep Learning Knowledge extraction 0202 electrical engineering electronic engineering information engineering 0105 earth and related environmental sciences Dialog Systems business.industry Deep learning ComputingMilieux_PERSONALCOMPUTING Object (computer science) Visualization Artificial Intelligence (cs.AI) 020201 artificial intelligence & image processing Computer vision Artificial intelligence business Natural language
Zdroj:	Conference on Computer Vision and Pattern Recognition Conference on Computer Vision and Pattern Recognition, Jul 2017, Honolulu, United States CVPR
Popis:	We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks. Comment: 23 pages; CVPR 2017 submission; see https://guesswhat.ai
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0180df5cad0f648f548e75de573622a9 https://hal.inria.fr/hal-01549641/file/1611.08481.pdf Zobrazit plný text záznamu