VirtualHome: Simulating Household Activities via Programs
Autor: | Marko Boben, Kevin Ra, Jiaman Li, Sanja Fidler, Antonio Torralba, Xavier Puig, Tingwu Wang |
---|---|
Přispěvatelé: | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Artificial Intelligence business.industry Interface (Java) Computer science Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition 020207 software engineering 02 engineering and technology Variety (cybernetics) Machine Learning (cs.LG) Computer Science - Learning Task (computing) Artificial Intelligence (cs.AI) Human–computer interaction 0202 electrical engineering electronic engineering information engineering Task analysis Code (cryptography) Robot 020201 artificial intelligence & image processing Artificial intelligence Representation (mathematics) business Natural language |
Zdroj: | CVPR arXiv |
DOI: | 10.48550/arxiv.1806.07011 |
Popis: | In this paper, we are interested in modeling complex activities that occur in a typical household. We propose to use programs, i.e., sequences of atomic actions and interactions, as a high level representation of complex tasks. Programs are interesting because they provide a non-ambiguous representation of a task, and allow agents to execute them. However, nowadays, there is no database providing this type of information. Towards this goal, we first crowd-source programs for a variety of activities that happen in people's homes, via a game-like interface used for teaching kids how to code. Using the collected dataset, we show how we can learn to extract programs directly from natural language descriptions or from videos. We then implement the most common atomic (inter)actions in the Unity3D game engine, and use our programs to 'drive' an artificial agent to execute tasks in a simulated household environment. Our VirtualHome simulator allows us to create a large activity video dataset with rich ground-truth, enabling training and testing of video understanding models. We further showcase examples of our agent performing tasks in our VirtualHome based on language descriptions. © 2018 IEEE. NSERC COHESA NETGP485577-15 IARPA D17PC00341 |
Databáze: | OpenAIRE |
Externí odkaz: |