Zobrazeno 1 - 10
of 17 849
pro vyhledávání: '"An Fei Li"'
Humans possess the visual-spatial intelligence to remember spaces from sequential visual observations. However, can Multimodal Large Language Models (MLLMs) trained on million-scale video datasets also ``think in space'' from videos? We present a nov
Externí odkaz:
http://arxiv.org/abs/2412.14171
Autor:
Chen, Changan, Zhang, Juze, Lakshmikanth, Shrinidhi K., Fang, Yusu, Shao, Ruizhi, Wetzstein, Gordon, Fei-Fei, Li, Adeli, Ehsan
Human communication is inherently multimodal, involving a combination of verbal and non-verbal cues such as speech, facial expressions, and body gestures. Modeling these behaviors is essential for understanding human interaction and for creating virt
Externí odkaz:
http://arxiv.org/abs/2412.10523
Autor:
Chandrasegaran, Keshigeyan, Gupta, Agrim, Hadzic, Lea M., Kota, Taran, He, Jimming, Eyzaguirre, Cristóbal, Durante, Zane, Li, Manling, Wu, Jiajun, Fei-Fei, Li
We present HourVideo, a benchmark dataset for hour-long video-language understanding. Our dataset consists of a novel task suite comprising summarization, perception (recall, tracking), visual reasoning (spatial, temporal, predictive, causal, counter
Externí odkaz:
http://arxiv.org/abs/2411.04998
Recent progress in imitation learning from human demonstrations has shown promising results in teaching robots manipulation skills. To further scale up training datasets, recent works start to use portable data collection devices without the need for
Externí odkaz:
http://arxiv.org/abs/2410.08464
Autor:
Dai, Tianyuan, Wong, Josiah, Jiang, Yunfan, Wang, Chen, Gokmen, Cem, Zhang, Ruohan, Wu, Jiajun, Fei-Fei, Li
Training robot policies in the real world can be unsafe, costly, and difficult to scale. Simulation serves as an inexpensive and potentially limitless source of training data, but suffers from the semantics and physics disparity between simulated and
Externí odkaz:
http://arxiv.org/abs/2410.07408
Autor:
Li, Manling, Zhao, Shiyu, Wang, Qineng, Wang, Kangrui, Zhou, Yu, Srivastava, Sanjana, Gokmen, Cem, Lee, Tony, Li, Li Erran, Zhang, Ruohan, Liu, Weiyu, Liang, Percy, Fei-Fei, Li, Mao, Jiayuan, Wu, Jiajun
We aim to evaluate Large Language Models (LLMs) for embodied decision making. While a significant body of work has been leveraging LLMs for decision making in embodied environments, we still lack a systematic understanding of their performance becaus
Externí odkaz:
http://arxiv.org/abs/2410.07166
Representing robotic manipulation tasks as constraints that associate the robot and the environment is a promising way to encode desired robot behaviors. However, it remains unclear how to formulate the constraints such that they are 1) versatile to
Externí odkaz:
http://arxiv.org/abs/2409.01652
Most existing human rendering methods require every part of the human to be fully visible throughout the input video. However, this assumption does not hold in real-life settings where obstructions are common, resulting in only partial visibility of
Externí odkaz:
http://arxiv.org/abs/2407.00316
Autor:
Durante, Zane, Harries, Robathan, Vendrow, Edward, Luo, Zelun, Kyuragi, Yuta, Kozuka, Kazuki, Fei-Fei, Li, Adeli, Ehsan
Understanding Activities of Daily Living (ADLs) is a crucial step for different applications including assistive robots, smart homes, and healthcare. However, to date, few benchmarks and methods have focused on complex ADLs, especially those involvin
Externí odkaz:
http://arxiv.org/abs/2406.01662
Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. The key challenge of this approach is to address simulation-to-reality (sim-to-real) gaps. Previous methods often require doma
Externí odkaz:
http://arxiv.org/abs/2405.10315