Výsledky vyhledávání - "Ichter, Brian"

Report

CoNVOI: Context-aware Navigation using Vision Language Models in Outdoor and Indoor Environments

Autor: Sathyamoorthy, Adarsh Jagan, Weerakoon, Kasun, Elnoor, Mohamed, Zore, Anuj, Ichter, Brian, Xia, Fei, Tan, Jie, Yu, Wenhao, Manocha, Dinesh

We present ConVOI, a novel method for autonomous robot navigation in real-world indoor and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two ways: first, we leverage their zero-shot image classification capability to ide

Externí odkaz: http://arxiv.org/abs/2403.15637

Zobrazit plný text záznamu

Report

Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new t

Externí odkaz: http://arxiv.org/abs/2402.11450

Zobrazit plný text záznamu

Report

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce o

Externí odkaz: http://arxiv.org/abs/2402.07872

Zobrazit plný text záznamu

Report

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

Foundation models that incorporate language, vision, and more recently actions have revolutionized the ability to harness internet scale data to reason about useful tasks. However, one of the key challenges of training embodied foundation models is t

Externí odkaz: http://arxiv.org/abs/2401.12963

Zobrazit plný text záznamu

Report

SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

Autor: Chen, Boyuan, Xu, Zhuo, Kirmani, Sean, Ichter, Brian, Driess, Danny, Florence, Pete, Sadigh, Dorsa, Guibas, Leonidas, Xia, Fei

Understanding and reasoning about spatial relationships is a fundamental capability for Visual Question Answering (VQA) and robotics. While Vision Language Models (VLM) have demonstrated remarkable performance in certain VQA benchmarks, they still la

Externí odkaz: http://arxiv.org/abs/2401.12168

Zobrazit plný text záznamu

Report

Foundation Models in Robotics: Applications, Challenges, and the Future

Autor: Firoozi, Roya, Tucker, Johnathan, Tian, Stephen, Majumdar, Anirudha, Sun, Jiankai, Liu, Weiyu, Zhu, Yuke, Song, Shuran, Kapoor, Ashish, Hausman, Karol, Ichter, Brian, Driess, Danny, Wu, Jiajun, Lu, Cewu, Schwager, Mac

We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In contrast, foun

Externí odkaz: http://arxiv.org/abs/2312.07843

Zobrazit plný text záznamu

Report

Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Autor: Li, Chengshu, Liang, Jacky, Zeng, Andy, Chen, Xinyun, Hausman, Karol, Sadigh, Dorsa, Levine, Sergey, Fei-Fei, Li, Xia, Fei, Ichter, Brian

Code provides a general syntactic structure to build complex programs and perform precise computations when paired with a code interpreter - we hypothesize that language models (LMs) can leverage code-writing to improve Chain of Thought reasoning not

Externí odkaz: http://arxiv.org/abs/2312.04474

Zobrazit plný text záznamu

Report

RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collect

Externí odkaz: http://arxiv.org/abs/2311.00899

Zobrazit plný text záznamu

Report

Conditionally Combining Robot Skills using Large Language Models

Autor: Zentner, K. R., Julian, Ryan, Ichter, Brian, Sukhatme, Gaurav S.

This paper combines two contributions. First, we introduce an extension of the Meta-World benchmark, which we call "Language-World," which allows a large language model to operate in a simulated robotic environment using semi-structured natural langu

Externí odkaz: http://arxiv.org/abs/2310.17019

Zobrazit plný text záznamu

Report

Video Language Planning

Autor: Du, Yilun, Yang, Mengjiao, Florence, Pete, Xia, Fei, Wahid, Ayzaan, Ichter, Brian, Sermanet, Pierre, Yu, Tianhe, Abbeel, Pieter, Tenenbaum, Joshua B., Kaelbling, Leslie, Zeng, Andy, Tompson, Jonathan

We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data. To this end, we present video languag

Externí odkaz: http://arxiv.org/abs/2310.10625

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání