Výsledky vyhledávání - "Agrawal, Harsh"

Report

Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms

Autor: Li, Zhangheng, You, Keen, Zhang, Haotian, Feng, Di, Agrawal, Harsh, Li, Xiujun, Moorthy, Mohana Prasad Sathya, Nichols, Jeff, Yang, Yinfei, Gan, Zhe

Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and data limitation. In this paper, we introduce Ferret-UI 2, a multimodal large la

Externí odkaz: http://arxiv.org/abs/2410.18967

Zobrazit plný text záznamu

Report

Grounding Multimodal Large Language Models in Actions

Autor: Szot, Andrew, Mazoure, Bogdan, Agrawal, Harsh, Hjelm, Devon, Kira, Zsolt, Toshev, Alexander

Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with t

Externí odkaz: http://arxiv.org/abs/2406.07904

Zobrazit plný text záznamu

Report

Large Language Models as Generalizable Policies for Embodied Tasks

Autor: Szot, Andrew, Schwarzer, Max, Agrawal, Harsh, Mazoure, Bogdan, Talbott, Walter, Metcalf, Katherine, Mackraz, Natalie, Hjelm, Devon, Toshev, Alexander

We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text in

Externí odkaz: http://arxiv.org/abs/2310.17722

Zobrazit plný text záznamu

Report

Housekeep: Tidying Virtual Households using Commonsense Reasoning

Autor: Kant, Yash, Ramachandran, Arun, Yenamandra, Sriram, Gilitschenski, Igor, Batra, Dhruv, Szot, Andrew, Agrawal, Harsh

We introduce Housekeep, a benchmark to evaluate commonsense reasoning in the home for embodied AI. In Housekeep, an embodied agent must tidy a house by rearranging misplaced objects without explicit instructions specifying which objects need to be re

Externí odkaz: http://arxiv.org/abs/2205.10712

Zobrazit plný text záznamu

Report

Simple and Effective Synthesis of Indoor 3D Scenes

Autor: Koh, Jing Yu, Agrawal, Harsh, Batra, Dhruv, Tucker, Richard, Waters, Austin, Lee, Honglak, Yang, Yinfei, Baldridge, Jason, Anderson, Peter

We study the problem of synthesizing immersive 3D indoor scenes from one or more images. Our aim is to generate high-resolution images and videos from novel viewpoints, including viewpoints that extrapolate far beyond the input images while maintaini

Externí odkaz: http://arxiv.org/abs/2204.02960

Zobrazit plný text záznamu

Akademický článek

The hematopoietic stem cell expansion niche in fetal liver: Current state of the art and the way forward

Autor: Agrawal, Harsh, Mehatre, Shubham Haribhau, Khurana, Satish

Publikováno v: In Experimental Hematology August 2024 136

Zobrazit plný text záznamu

Report

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

Autor: Moudgil, Abhinav, Majumdar, Arjun, Agrawal, Harsh, Lee, Stefan, Batra, Dhruv

Natural language instructions for visual navigation often use scene descriptions (e.g., "bedroom") and object references (e.g., "green chairs") to provide a breadcrumb trail to a goal location. This work presents a transformer-based vision-and-langua

Externí odkaz: http://arxiv.org/abs/2110.14143

Zobrazit plný text záznamu

Report

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

Autor: Zhao, Xiaoming, Agrawal, Harsh, Batra, Dhruv, Schwing, Alexander

It is fundamental for personal robots to reliably navigate to a specified goal. To study this task, PointGoal navigation has been introduced in simulated Embodied AI environments. Recent advances solve this PointGoal navigation task with near-perfect

Externí odkaz: http://arxiv.org/abs/2108.11550

Zobrazit plný text záznamu

Report

Contrast and Classify: Training Robust VQA Models

Autor: Kant, Yash, Moudgil, Abhinav, Batra, Dhruv, Parikh, Devi, Agrawal, Harsh

Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions. Existing approaches address this by augmenting the dataset with question para

Externí odkaz: http://arxiv.org/abs/2010.06087

Zobrazit plný text záznamu

Report

Spatially Aware Multimodal Transformers for TextVQA

Autor: Kant, Yash, Batra, Dhruv, Anderson, Peter, Schwing, Alex, Parikh, Devi, Lu, Jiasen, Agrawal, Harsh

Textual cues are essential for everyday tasks like buying groceries and using public transport. To develop this assistive technology, we study the TextVQA task, i.e., reasoning about text in images to answer a question. Existing approaches are limite

Externí odkaz: http://arxiv.org/abs/2007.12146

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání