QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding

Autor:	Mehan, Yash, Gupta, Kumaraditya, Jayanti, Rohit, Govil, Anirudh, Garg, Sourav, Krishna, Madhava
Rok vydání:	2024
Předmět:	Computer Science - Computer Vision and Pattern Recognition Computer Science - Robotics
Druh dokumentu:	Working Paper
Popis:	Understanding the structural organisation of 3D indoor scenes in terms of rooms is often accomplished via floorplan extraction. Robotic tasks such as planning and navigation require a semantic understanding of the scene as well. This is typically achieved via object-level semantic segmentation. However, such methods struggle to segment out topological regions like "kitchen" in the scene. In this work, we introduce a two-step pipeline. First, we extract a topological map, i.e., floorplan of the indoor scene using a novel multi-channel occupancy representation. Then, we generate CLIP-aligned features and semantic labels for every room instance based on the objects it contains using a self-attention transformer. Our language-topology alignment supports natural language querying, e.g., a "place to cook" locates the "kitchen". We outperform the current state-of-the-art on room segmentation by ~20% and room classification by ~12%. Our detailed qualitative analysis and ablation studies provide insights into the problem of joint structural and semantic 3D scene understanding.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2404.06442 Zobrazit plný text záznamu View this record from Arxiv