Zobrazeno 1 - 10
of 36 536
pro vyhledávání: '"Spatial reasoning"'
Autor:
Sun, Qi, Hong, Pengfei, Pala, Tej Deep, Toh, Vernon, Tan, U-Xuan, Ghosal, Deepanway, Poria, Soujanya
Traditional reinforcement learning-based robotic control methods are often task-specific and fail to generalize across diverse environments or unseen objects and instructions. Visual Language Models (VLMs) demonstrate strong scene understanding and p
Externí odkaz:
http://arxiv.org/abs/2412.11974
3D spatial reasoning is the ability to analyze and interpret the positions, orientations, and spatial relationships of objects within the 3D space. This allows models to develop a comprehensive understanding of the 3D scene, enabling their applicabil
Externí odkaz:
http://arxiv.org/abs/2412.07825
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they often struggle with spatial reasoning. This paper presents a novel neural-symbolic framework that enhances LLMs' spatial reasoning abilities through
Externí odkaz:
http://arxiv.org/abs/2411.18564
Autor:
Shiri, Fatemeh, Guo, Xiao-Yu, Far, Mona Golestan, Yu, Xin, Haffari, Gholamreza, Li, Yuan-Fang
Large Multimodal Models (LMMs) have achieved strong performance across a range of vision and language tasks. However, their spatial reasoning capabilities are under-investigated. In this paper, we construct a novel VQA dataset, Spatial-MM, to compreh
Externí odkaz:
http://arxiv.org/abs/2411.06048
Autor:
Tang, Yihong, Qu, Ao, Wang, Zhaokai, Zhuang, Dingyi, Wu, Zhaofeng, Ma, Wei, Wang, Shenhao, Zheng, Yunhan, Zhao, Zhan, Zhao, Jinhua
Vision language models (VLMs) have demonstrated impressive performance across a wide range of downstream tasks. However, their proficiency in spatial reasoning remains limited, despite its crucial role in tasks involving navigation and interaction wi
Externí odkaz:
http://arxiv.org/abs/2410.16162
Reasoning about spatial relationships between objects is essential for many real-world robotic tasks, such as fetch-and-delivery, object rearrangement, and object search. The ability to detect and disambiguate different objects and identify their loc
Externí odkaz:
http://arxiv.org/abs/2410.07394
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
The Zero-Shot Object Navigation (ZSON) task requires embodied agents to find a previously unseen object by navigating in unfamiliar environments. Such a goal-oriented exploration heavily relies on the ability to perceive, understand, and reason based
Externí odkaz:
http://arxiv.org/abs/2411.16425