Výsledky vyhledávání

Report

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

Autor: Yang, Jihan, Yang, Shusheng, Gupta, Anjali W., Han, Rilyn, Fei-Fei, Li, Xie, Saining

Humans possess the visual-spatial intelligence to remember spaces from sequential visual observations. However, can Multimodal Large Language Models (MLLMs) trained on million-scale video datasets also ``think in space'' from videos? We present a nov

Externí odkaz: http://arxiv.org/abs/2412.14171

Zobrazit plný text záznamu

Report

The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion

Autor: Chen, Changan, Zhang, Juze, Lakshmikanth, Shrinidhi K., Fang, Yusu, Shao, Ruizhi, Wetzstein, Gordon, Fei-Fei, Li, Adeli, Ehsan

Human communication is inherently multimodal, involving a combination of verbal and non-verbal cues such as speech, facial expressions, and body gestures. Modeling these behaviors is essential for understanding human interaction and for creating virt

Externí odkaz: http://arxiv.org/abs/2412.10523

Zobrazit plný text záznamu

Report

HourVideo: 1-Hour Video-Language Understanding

Autor: Chandrasegaran, Keshigeyan, Gupta, Agrim, Hadzic, Lea M., Kota, Taran, He, Jimming, Eyzaguirre, Cristóbal, Durante, Zane, Li, Manling, Wu, Jiajun, Fei-Fei, Li

We present HourVideo, a benchmark dataset for hour-long video-language understanding. Our dataset consists of a novel task suite comprising summarization, perception (recall, tracking), visual reasoning (spatial, temporal, predictive, causal, counter

Externí odkaz: http://arxiv.org/abs/2411.04998

Zobrazit plný text záznamu

Report

ARCap: Collecting High-quality Human Demonstrations for Robot Learning with Augmented Reality Feedback

Autor: Chen, Sirui, Wang, Chen, Nguyen, Kaden, Fei-Fei, Li, Liu, C. Karen

Recent progress in imitation learning from human demonstrations has shown promising results in teaching robots manipulation skills. To further scale up training datasets, recent works start to use portable data collection devices without the need for

Externí odkaz: http://arxiv.org/abs/2410.08464

Zobrazit plný text záznamu

Report

Automated Creation of Digital Cousins for Robust Policy Learning

Autor: Dai, Tianyuan, Wong, Josiah, Jiang, Yunfan, Wang, Chen, Gokmen, Cem, Zhang, Ruohan, Wu, Jiajun, Fei-Fei, Li

Training robot policies in the real world can be unsafe, costly, and difficult to scale. Simulation serves as an inexpensive and potentially limitless source of training data, but suffers from the semantics and physics disparity between simulated and

Externí odkaz: http://arxiv.org/abs/2410.07408

Zobrazit plný text záznamu

Report

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

Autor: Li, Manling, Zhao, Shiyu, Wang, Qineng, Wang, Kangrui, Zhou, Yu, Srivastava, Sanjana, Gokmen, Cem, Lee, Tony, Li, Li Erran, Zhang, Ruohan, Liu, Weiyu, Liang, Percy, Fei-Fei, Li, Mao, Jiayuan, Wu, Jiajun

We aim to evaluate Large Language Models (LLMs) for embodied decision making. While a significant body of work has been leveraging LLMs for decision making in embodied environments, we still lack a systematic understanding of their performance becaus

Externí odkaz: http://arxiv.org/abs/2410.07166

Zobrazit plný text záznamu

Report

ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation

Autor: Huang, Wenlong, Wang, Chen, Li, Yunzhu, Zhang, Ruohan, Fei-Fei, Li

Representing robotic manipulation tasks as constraints that associate the robot and the environment is a promising way to encode desired robot behaviors. However, it remains unclear how to formulate the constraints such that they are 1) versatile to

Externí odkaz: http://arxiv.org/abs/2409.01652

Zobrazit plný text záznamu

Report

OccFusion: Rendering Occluded Humans with Generative Diffusion Priors

Autor: Sun, Adam, Xiang, Tiange, Delp, Scott, Fei-Fei, Li, Adeli, Ehsan

Most existing human rendering methods require every part of the human to be fully visible throughout the input video. However, this assumption does not hold in real-life settings where obstructions are common, resulting in only partial visibility of

Externí odkaz: http://arxiv.org/abs/2407.00316

Zobrazit plný text záznamu

Report

Few-Shot Classification of Interactive Activities of Daily Living (InteractADL)

Autor: Durante, Zane, Harries, Robathan, Vendrow, Edward, Luo, Zelun, Kyuragi, Yuta, Kozuka, Kazuki, Fei-Fei, Li, Adeli, Ehsan

Understanding Activities of Daily Living (ADLs) is a crucial step for different applications including assistive robots, smart homes, and healthcare. However, to date, few benchmarks and methods have focused on complex ADLs, especially those involvin

Externí odkaz: http://arxiv.org/abs/2406.01662

Zobrazit plný text záznamu

Report

TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction

Autor: Jiang, Yunfan, Wang, Chen, Zhang, Ruohan, Wu, Jiajun, Fei-Fei, Li

Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. The key challenge of this approach is to address simulation-to-reality (sim-to-real) gaps. Previous methods often require doma

Externí odkaz: http://arxiv.org/abs/2405.10315

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání