GuessWhat?! Visual object discovery through multi-modal dialogue

Autor: Olivier Pietquin, Aaron Courville, Sarath Chandar, Harm de Vries, Hugo Larochelle, Florian Strub
Přispěvatelé: Université de Montréal (UdeM), Sequential Learning (SEQUEL), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Twitter, Projet IGLU, Twitter Inc
Jazyk: angličtina
Rok vydání: 2017
Předmět:
FOS: Computer and information sciences
Computer science
Computer Science - Artificial Intelligence
Computer Vision and Pattern Recognition (cs.CV)
Computer Science - Computer Vision and Pattern Recognition
02 engineering and technology
010501 environmental sciences
[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE]
01 natural sciences
Oracle
Task (project management)
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Deep Learning
Knowledge extraction
0202 electrical engineering
electronic engineering
information engineering

0105 earth and related environmental sciences
Dialog Systems
business.industry
Deep learning
ComputingMilieux_PERSONALCOMPUTING
Object (computer science)
Visualization
Artificial Intelligence (cs.AI)
020201 artificial intelligence & image processing
Computer vision
Artificial intelligence
business
Natural language
Zdroj: Conference on Computer Vision and Pattern Recognition
Conference on Computer Vision and Pattern Recognition, Jul 2017, Honolulu, United States
CVPR
Popis: We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks.
Comment: 23 pages; CVPR 2017 submission; see https://guesswhat.ai
Databáze: OpenAIRE