Zobrazeno 1 - 10
of 79
pro vyhledávání: '"Ichter, Brian"'
Autor:
Sathyamoorthy, Adarsh Jagan, Weerakoon, Kasun, Elnoor, Mohamed, Zore, Anuj, Ichter, Brian, Xia, Fei, Tan, Jie, Yu, Wenhao, Manocha, Dinesh
We present ConVOI, a novel method for autonomous robot navigation in real-world indoor and outdoor environments using Vision Language Models (VLMs). We employ VLMs in two ways: first, we leverage their zero-shot image classification capability to ide
Externí odkaz:
http://arxiv.org/abs/2403.15637
Autor:
Liang, Jacky, Xia, Fei, Yu, Wenhao, Zeng, Andy, Arenas, Montserrat Gonzalez, Attarian, Maria, Bauza, Maria, Bennice, Matthew, Bewley, Alex, Dostmohamed, Adil, Fu, Chuyuan Kelly, Gileadi, Nimrod, Giustina, Marissa, Gopalakrishnan, Keerthana, Hasenclever, Leonard, Humplik, Jan, Hsu, Jasmine, Joshi, Nikhil, Jyenis, Ben, Kew, Chase, Kirmani, Sean, Lee, Tsang-Wei Edward, Lee, Kuang-Huei, Michaely, Assaf Hurwitz, Moore, Joss, Oslund, Ken, Rao, Dushyant, Ren, Allen, Tabanpour, Baruch, Vuong, Quan, Wahid, Ayzaan, Xiao, Ted, Xu, Ying, Zhuang, Vincent, Xu, Peng, Frey, Erik, Caluwaerts, Ken, Zhang, Tingnan, Ichter, Brian, Tompson, Jonathan, Takayama, Leila, Vanhoucke, Vincent, Shafran, Izhak, Mataric, Maja, Sadigh, Dorsa, Heess, Nicolas, Rao, Kanishka, Stewart, Nik, Tan, Jie, Parada, Carolina
Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new t
Externí odkaz:
http://arxiv.org/abs/2402.11450
Autor:
Nasiriany, Soroush, Xia, Fei, Yu, Wenhao, Xiao, Ted, Liang, Jacky, Dasgupta, Ishita, Xie, Annie, Driess, Danny, Wahid, Ayzaan, Xu, Zhuo, Vuong, Quan, Zhang, Tingnan, Lee, Tsang-Wei Edward, Lee, Kuang-Huei, Xu, Peng, Kirmani, Sean, Zhu, Yuke, Zeng, Andy, Hausman, Karol, Heess, Nicolas, Finn, Chelsea, Levine, Sergey, Ichter, Brian
Vision language models (VLMs) have shown impressive capabilities across a variety of tasks, from logical reasoning to visual understanding. This opens the door to richer interaction with the world, for example robotic control. However, VLMs produce o
Externí odkaz:
http://arxiv.org/abs/2402.07872
Autor:
Ahn, Michael, Dwibedi, Debidatta, Finn, Chelsea, Arenas, Montse Gonzalez, Gopalakrishnan, Keerthana, Hausman, Karol, Ichter, Brian, Irpan, Alex, Joshi, Nikhil, Julian, Ryan, Kirmani, Sean, Leal, Isabel, Lee, Edward, Levine, Sergey, Lu, Yao, Maddineni, Sharath, Rao, Kanishka, Sadigh, Dorsa, Sanketi, Pannag, Sermanet, Pierre, Vuong, Quan, Welker, Stefan, Xia, Fei, Xiao, Ted, Xu, Peng, Xu, Steve, Xu, Zhuo
Foundation models that incorporate language, vision, and more recently actions have revolutionized the ability to harness internet scale data to reason about useful tasks. However, one of the key challenges of training embodied foundation models is t
Externí odkaz:
http://arxiv.org/abs/2401.12963
Autor:
Chen, Boyuan, Xu, Zhuo, Kirmani, Sean, Ichter, Brian, Driess, Danny, Florence, Pete, Sadigh, Dorsa, Guibas, Leonidas, Xia, Fei
Understanding and reasoning about spatial relationships is a fundamental capability for Visual Question Answering (VQA) and robotics. While Vision Language Models (VLM) have demonstrated remarkable performance in certain VQA benchmarks, they still la
Externí odkaz:
http://arxiv.org/abs/2401.12168
Autor:
Firoozi, Roya, Tucker, Johnathan, Tian, Stephen, Majumdar, Anirudha, Sun, Jiankai, Liu, Weiyu, Zhu, Yuke, Song, Shuran, Kapoor, Ashish, Hausman, Karol, Ichter, Brian, Driess, Danny, Wu, Jiajun, Lu, Cewu, Schwager, Mac
We survey applications of pretrained foundation models in robotics. Traditional deep learning models in robotics are trained on small datasets tailored for specific tasks, which limits their adaptability across diverse applications. In contrast, foun
Externí odkaz:
http://arxiv.org/abs/2312.07843
Autor:
Li, Chengshu, Liang, Jacky, Zeng, Andy, Chen, Xinyun, Hausman, Karol, Sadigh, Dorsa, Levine, Sergey, Fei-Fei, Li, Xia, Fei, Ichter, Brian
Code provides a general syntactic structure to build complex programs and perform precise computations when paired with a code interpreter - we hypothesize that language models (LMs) can leverage code-writing to improve Chain of Thought reasoning not
Externí odkaz:
http://arxiv.org/abs/2312.04474
Autor:
Sermanet, Pierre, Ding, Tianli, Zhao, Jeffrey, Xia, Fei, Dwibedi, Debidatta, Gopalakrishnan, Keerthana, Chan, Christine, Dulac-Arnold, Gabriel, Maddineni, Sharath, Joshi, Nikhil J, Florence, Pete, Han, Wei, Baruch, Robert, Lu, Yao, Mirchandani, Suvir, Xu, Peng, Sanketi, Pannag, Hausman, Karol, Shafran, Izhak, Ichter, Brian, Cao, Yuan
We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collect
Externí odkaz:
http://arxiv.org/abs/2311.00899
This paper combines two contributions. First, we introduce an extension of the Meta-World benchmark, which we call "Language-World," which allows a large language model to operate in a simulated robotic environment using semi-structured natural langu
Externí odkaz:
http://arxiv.org/abs/2310.17019
Autor:
Du, Yilun, Yang, Mengjiao, Florence, Pete, Xia, Fei, Wahid, Ayzaan, Ichter, Brian, Sermanet, Pierre, Yu, Tianhe, Abbeel, Pieter, Tenenbaum, Joshua B., Kaelbling, Leslie, Zeng, Andy, Tompson, Jonathan
We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data. To this end, we present video languag
Externí odkaz:
http://arxiv.org/abs/2310.10625