Zobrazeno 1 - 10
of 2 303
pro vyhledávání: '"Wang, Tai"'
6D object pose estimation aims at determining an object's translation, rotation, and scale, typically from a single RGBD image. Recent advancements have expanded this estimation from instance-level to category-level, allowing models to generalize acr
Externí odkaz:
http://arxiv.org/abs/2409.18261
Recent advancements in Large Multimodal Models (LMMs) have greatly enhanced their proficiency in 2D visual understanding tasks, enabling them to effectively process and understand images and videos. However, the development of LMMs with 3D-awareness
Externí odkaz:
http://arxiv.org/abs/2409.18125
Autor:
Wang, Hanqing, Chen, Jiahe, Huang, Wensi, Ben, Qingwei, Wang, Tai, Mi, Boyu, Huang, Tao, Zhao, Siheng, Chen, Yilun, Yang, Sizhe, Cao, Peizhou, Yu, Wenye, Ye, Zichao, Li, Jialun, Long, Junfeng, Wang, Zirui, Wang, Huiling, Zhao, Ying, Tu, Zhongying, Qiao, Yu, Lin, Dahua, Pang, Jiangmiao
Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied mod
Externí odkaz:
http://arxiv.org/abs/2407.10943
Object-oriented embodied navigation aims to locate specific objects, defined by category or depicted in images. Existing methods often struggle to generalize to open vocabulary goals without extensive training data. While recent advances in Vision-La
Externí odkaz:
http://arxiv.org/abs/2407.09016
Although great progress has been made in 3D visual grounding, current models still rely on explicit textual descriptions for grounding and lack the ability to reason human intentions from implicit instructions. We propose a new task called 3D reasoni
Externí odkaz:
http://arxiv.org/abs/2407.01525
Autor:
Gao, Jiawei, Wang, Ziqin, Xiao, Zeqi, Wang, Jingbo, Wang, Tai, Cao, Jinkun, Hu, Xiaolin, Liu, Si, Dai, Jifeng, Pang, Jiangmiao
Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large
Externí odkaz:
http://arxiv.org/abs/2406.14558
Autor:
Lyu, Ruiyuan, Wang, Tai, Lin, Jingli, Yang, Shuai, Mao, Xiaohan, Chen, Yilun, Xu, Runsen, Huang, Haifeng, Zhu, Chenming, Lin, Dahua, Pang, Jiangmiao
With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous wor
Externí odkaz:
http://arxiv.org/abs/2406.09401
An empirical analysis, suggested by optimal Merton dynamics, reveals some unexpected features of asset volumes. These features are connected to traders' belief and risk aversion. This paper proposes a trading strategy model in the optimal Merton fram
Externí odkaz:
http://arxiv.org/abs/2406.05854
Autor:
Sun, Jiahao, Qing, Chunmei, Xu, Xiang, Kong, Lingdong, Liu, Youquan, Li, Li, Zhu, Chenming, Zhang, Jingwei, Xiao, Zeqi, Chen, Runnan, Wang, Tai, Zhang, Wenwei, Chen, Kai
In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fai
Externí odkaz:
http://arxiv.org/abs/2405.14870
Autor:
Chen, Yilun, Yang, Shuai, Huang, Haifeng, Wang, Tai, Lyu, Ruiyuan, Xu, Runsen, Lin, Dahua, Pang, Jiangmiao
Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D-LLM, which explores the potential of 3D large multi-modal models (3D L
Externí odkaz:
http://arxiv.org/abs/2405.10370