Zobrazeno 1 - 10
of 39 464
pro vyhledávání: '"world model"'
Autor:
Chi, Xiaowei, Zhang, Hengyuan, Fan, Chun-Kai, Qi, Xingqun, Zhang, Rongyu, Chen, Anthony, Chan, Chi-min, Xue, Wei, Luo, Wenhan, Zhang, Shanghang, Guo, Yike
World models integrate raw data from various modalities, such as images and language to simulate comprehensive interactions in the world, thereby displaying crucial roles in fields like mixed reality and robotics. Yet, applying the world model for ac
Externí odkaz:
http://arxiv.org/abs/2410.15461
A longstanding goal of artificial general intelligence is highly capable generalists that can learn from diverse experiences and generalize to unseen tasks. The language and vision communities have seen remarkable progress toward this trend by scalin
Externí odkaz:
http://arxiv.org/abs/2410.11448
Autor:
Gu, Songen, Yin, Wei, Jin, Bu, Guo, Xiaoyang, Wang, Junming, Li, Haodong, Zhang, Qian, Long, Xiaoxiao
We propose DOME, a diffusion-based world model that predicts future occupancy frames based on past occupancy observations. The ability of this world model to capture the evolution of the environment is crucial for planning in autonomous driving. Comp
Externí odkaz:
http://arxiv.org/abs/2410.10429
Autor:
Zhang, Kaidong, Ren, Pengzhen, Lin, Bingqian, Lin, Junfan, Ma, Shikui, Xu, Hang, Liang, Xiaodan
Language-guided robotic manipulation is a challenging task that requires an embodied agent to follow abstract user instructions to accomplish various complex manipulation tasks. Previous work trivially fitting the data without revealing the relation
Externí odkaz:
http://arxiv.org/abs/2410.10394
Autor:
Zhou, Siyu, Zhou, Tianyi, Yang, Yijun, Long, Guodong, Ye, Deheng, Jiang, Jing, Zhang, Chengqi
Can large language models (LLMs) directly serve as powerful world models for model-based agents? While the gaps between the prior knowledge of LLMs and the specified environment's dynamics do exist, our study reveals that the gaps can be bridged by a
Externí odkaz:
http://arxiv.org/abs/2410.07484
Enhancing the reasoning capabilities of large language models (LLMs) remains a key challenge, especially for tasks that require complex, multi-step decision-making. Humans excel at these tasks by leveraging deliberate planning with an internal world
Externí odkaz:
http://arxiv.org/abs/2410.03136
Autor:
Liu, Zeyang, Yang, Xinrui, Sun, Shiguang, Qian, Long, Wan, Lipeng, Chen, Xingyu, Lan, Xuguang
Recent progress in generative models has stimulated significant innovations in many fields, such as image generation and chatbots. Despite their success, these models often produce sketchy and misleading solutions for complex multi-agent decision-mak
Externí odkaz:
http://arxiv.org/abs/2410.02664
Autor:
Lai, Hang, Cao, Jiahang, Xu, Jiafeng, Wu, Hongtao, Lin, Yunfeng, Kong, Tao, Yu, Yong, Zhang, Weinan
Legged locomotion over various terrains is challenging and requires precise perception of the robot and its surroundings from both proprioception and vision. However, learning directly from high-dimensional visual input is often data-inefficient and
Externí odkaz:
http://arxiv.org/abs/2409.16784
Autor:
Xiao, Lingyu, Liu, Jiang-Jiang, Yang, Sen, Li, Xiaofan, Ye, Xiaoqing, Yang, Wankou, Wang, Jingdong
The autoregressive world model exhibits robust generalization capabilities in vectorized scene understanding but encounters difficulties in deriving actions due to insufficient uncertainty modeling and self-delusion. In this paper, we explore the fea
Externí odkaz:
http://arxiv.org/abs/2409.15730
Autor:
Yan, Ziyang, Dong, Wenzhen, Shao, Yihua, Lu, Yuhang, Haiyang, Liu, Liu, Jingwen, Wang, Haozhe, Wang, Zhe, Wang, Yan, Remondino, Fabio, Ma, Yuexin
End-to-end autonomous driving with vision-only is not only more cost-effective compared to LiDAR-vision fusion but also more reliable than traditional methods. To achieve a economical and robust purely visual autonomous driving system, we propose Ren
Externí odkaz:
http://arxiv.org/abs/2409.11356