Zobrazeno 1 - 10
of 126
pro vyhledávání: '"Kembhavi, Aniruddha"'
Autor:
Zeng, Kuo-Hao, Zhang, Zichen, Ehsani, Kiana, Hendrix, Rose, Salvador, Jordi, Herrasti, Alvaro, Girshick, Ross, Kembhavi, Aniruddha, Weihs, Luca
We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained end-to-end with reinforcement learning at scale that generalizes to the real-world without adaptation despite being trained purely in simulation. PoliFormer uses
Externí odkaz:
http://arxiv.org/abs/2406.20083
We present CodeNav, an LLM agent that navigates and leverages previously unseen code repositories to solve user queries. In contrast to tool-use LLM agents that require ``registration'' of all relevant tools via manual descriptions within the LLM con
Externí odkaz:
http://arxiv.org/abs/2406.12276
Autor:
Zhang, Jieyu, Huang, Weikai, Ma, Zixian, Michel, Oscar, He, Dong, Gupta, Tanmay, Ma, Wei-Chiu, Farhadi, Ali, Kembhavi, Aniruddha, Krishna, Ranjay
Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for thei
Externí odkaz:
http://arxiv.org/abs/2406.11775
We present Piva (Preserving Identity with Variational Score Distillation), a novel optimization-based method for editing images and 3D models based on diffusion models. Specifically, our approach is inspired by the recently proposed method for 2D ima
Externí odkaz:
http://arxiv.org/abs/2406.08953
A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, recent investigations find that most-if not all-our
Externí odkaz:
http://arxiv.org/abs/2404.02145
Autor:
Lu, Jiasen, Clark, Christopher, Lee, Sangho, Zhang, Zichen, Khosla, Savya, Marten, Ryan, Hoiem, Derek, Kembhavi, Aniruddha
We present Unified-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action. To unify different modalities, we tokenize inputs and outputs -- images, text, audio, action, bounding
Externí odkaz:
http://arxiv.org/abs/2312.17172
Customizing robotic behaviors to be aligned with diverse human preferences is an underexplored challenge in the field of embodied AI. In this paper, we present Promptable Behaviors, a novel framework that facilitates efficient personalization of robo
Externí odkaz:
http://arxiv.org/abs/2312.09337
Autor:
Yang, Yue, Sun, Fan-Yun, Weihs, Luca, VanderBilt, Eli, Herrasti, Alvaro, Han, Winson, Wu, Jiajun, Haber, Nick, Krishna, Ranjay, Liu, Lingjie, Callison-Burch, Chris, Yatskar, Mark, Kembhavi, Aniruddha, Clark, Christopher
3D simulated environments play a critical role in Embodied AI, but their creation requires expertise and extensive manual effort, restricting their diversity and scope. To mitigate this limitation, we present Holodeck, a system that generates 3D envi
Externí odkaz:
http://arxiv.org/abs/2312.09067
Recent advancements in robotics have enabled robots to navigate complex scenes or manipulate diverse objects independently. However, robots are still impotent in many household tasks requiring coordinated behaviors such as opening doors. The factoriz
Externí odkaz:
http://arxiv.org/abs/2312.06639
Autor:
Ehsani, Kiana, Gupta, Tanmay, Hendrix, Rose, Salvador, Jordi, Weihs, Luca, Zeng, Kuo-Hao, Singh, Kunal Pratap, Kim, Yejin, Han, Winson, Herrasti, Alvaro, Krishna, Ranjay, Schwenk, Dustin, VanderBilt, Eli, Kembhavi, Aniruddha
Reinforcement learning (RL) with dense rewards and imitation learning (IL) with human-generated trajectories are the most widely used approaches for training modern embodied agents. RL requires extensive reward shaping and auxiliary losses and is oft
Externí odkaz:
http://arxiv.org/abs/2312.02976