Výsledky vyhledávání

Report

Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation

Autor: Zhang, Mengchen, Wu, Tong, Wang, Tai, Wang, Tengfei, Liu, Ziwei, Lin, Dahua

6D object pose estimation aims at determining an object's translation, rotation, and scale, typically from a single RGBD image. Recent advancements have expanded this estimation from instance-level to category-level, allowing models to generalize acr

Externí odkaz: http://arxiv.org/abs/2409.18261

Zobrazit plný text záznamu

Report

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness

Autor: Zhu, Chenming, Wang, Tai, Zhang, Wenwei, Pang, Jiangmiao, Liu, Xihui

Recent advancements in Large Multimodal Models (LMMs) have greatly enhanced their proficiency in 2D visual understanding tasks, enabling them to effectively process and understand images and videos. However, the development of LMMs with 3D-awareness

Externí odkaz: http://arxiv.org/abs/2409.18125

Zobrazit plný text záznamu

Report

GRUtopia: Dream General Robots in a City at Scale

Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied mod

Externí odkaz: http://arxiv.org/abs/2407.10943

Zobrazit plný text záznamu

Report

OVExp: Open Vocabulary Exploration for Object-Oriented Navigation

Autor: Wei, Meng, Wang, Tai, Chen, Yilun, Wang, Hanqing, Pang, Jiangmiao, Liu, Xihui

Object-oriented embodied navigation aims to locate specific objects, defined by category or depicted in images. Existing methods often struggle to generalize to open vocabulary goals without extensive training data. While recent advances in Vision-La

Externí odkaz: http://arxiv.org/abs/2407.09016

Zobrazit plný text záznamu

Report

ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities

Autor: Zhu, Chenming, Wang, Tai, Zhang, Wenwei, Chen, Kai, Liu, Xihui

Although great progress has been made in 3D visual grounding, current models still rely on explicit textual descriptions for grounding and lack the ability to reason human intentions from implicit instructions. We propose a new task called 3D reasoni

Externí odkaz: http://arxiv.org/abs/2407.01525

Zobrazit plný text záznamu

Report

CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

Autor: Gao, Jiawei, Wang, Ziqin, Xiao, Zeqi, Wang, Jingbo, Wang, Tai, Cao, Jinkun, Hu, Xiaolin, Liu, Si, Dai, Jifeng, Pang, Jiangmiao

Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large

Externí odkaz: http://arxiv.org/abs/2406.14558

Zobrazit plný text záznamu

Report

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

Autor: Lyu, Ruiyuan, Wang, Tai, Lin, Jingli, Yang, Shuai, Mao, Xiaohan, Chen, Yilun, Xu, Runsen, Huang, Haifeng, Zhu, Chenming, Lin, Dahua, Pang, Jiangmiao

With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous wor

Externí odkaz: http://arxiv.org/abs/2406.09401

Zobrazit plný text záznamu

Report

Can market volumes reveal traders' rationality and a new risk premium?

Autor: Mariani, Francesca, Recchioni, Maria Cristina, Wang, Tai-Ho, Giacalone, Roberto

An empirical analysis, suggested by optimal Merton dynamics, reveals some unexpected features of asset volumes. These features are connected to traders' belief and risk aversion. This paper proposes a trading strategy model in the optimal Merton fram

Externí odkaz: http://arxiv.org/abs/2406.05854

Zobrazit plný text záznamu

Report

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

Autor: Sun, Jiahao, Qing, Chunmei, Xu, Xiang, Kong, Lingdong, Liu, Youquan, Li, Li, Zhu, Chenming, Zhang, Jingwei, Xiao, Zeqi, Chen, Runnan, Wang, Tai, Zhang, Wenwei, Chen, Kai

In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fai

Externí odkaz: http://arxiv.org/abs/2405.14870

Zobrazit plný text záznamu

Report

Grounded 3D-LLM with Referent Tokens

Autor: Chen, Yilun, Yang, Shuai, Huang, Haifeng, Wang, Tai, Lyu, Ruiyuan, Xu, Runsen, Lin, Dahua, Pang, Jiangmiao

Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D-LLM, which explores the potential of 3D large multi-modal models (3D L

Externí odkaz: http://arxiv.org/abs/2405.10370

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání