Výsledky vyhledávání - "Sampat, Shailaja Keyur"

Report

VL-GLUE: A Suite of Fundamental yet Challenging Visuo-Linguistic Reasoning Tasks

Autor: Sampat, Shailaja Keyur, Nakamura, Mutsumi, Kailas, Shankar, Aggarwal, Kartik, Zhou, Mandy, Yang, Yezhou, Baral, Chitta

Deriving inference from heterogeneous inputs (such as images, text, and audio) is an important skill for humans to perform day-to-day tasks. A similar ability is desirable for the development of advanced Artificial Intelligence (AI) systems. While st

Externí odkaz: http://arxiv.org/abs/2410.13666

Zobrazit plný text záznamu

Report

ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions

Autor: Sampat, Shailaja Keyur, Yang, Yezhou, Baral, Chitta

Humans observe various actions being performed by other humans (physically or in videos/images) and can draw a wide range of inferences about it beyond what they can visually perceive. Such inferences include determining the aspects of the world that

Externí odkaz: http://arxiv.org/abs/2410.13662

Zobrazit plný text záznamu

Report

Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts?

Autor: Sampat, Shailaja Keyur, Patel, Maitreya, Yang, Yezhou, Baral, Chitta

An ability to learn about new objects from a small amount of visual data and produce convincing linguistic justification about the presence/absence of certain concepts (that collectively compose the object) in novel scenarios is an important characte

Externí odkaz: http://arxiv.org/abs/2410.13651

Zobrazit plný text záznamu

Report

Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task

Autor: Sampat, Shailaja Keyur, Banerjee, Pratyay, Yang, Yezhou, Baral, Chitta

'Actions' play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform 'Reasoning about Actions & Change' (RAC). This has been an important research dir

Externí odkaz: http://arxiv.org/abs/2212.03866

Zobrazit plný text záznamu

Report

Learning Action-Effect Dynamics from Pairs of Scene-graphs

Autor: Sampat, Shailaja Keyur, Banerjee, Pratyay, Yang, Yezhou, Baral, Chitta

Externí odkaz: http://arxiv.org/abs/2212.03433

Zobrazit plný text záznamu

Report

Reasoning about Actions over Visual and Linguistic Modalities: A Survey

Autor: Sampat, Shailaja Keyur, Patel, Maitreya, Das, Subhasish, Yang, Yezhou, Baral, Chitta

'Actions' play a vital role in how humans interact with the world and enable them to achieve desired goals. As a result, most common sense (CS) knowledge for humans revolves around actions. While 'Reasoning about Actions & Change' (RAC) has been wide

Externí odkaz: http://arxiv.org/abs/2207.07568

Zobrazit plný text záznamu

Report

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions

Externí odkaz: http://arxiv.org/abs/2204.07705

Zobrazit plný text záznamu

Report

CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images

Autor: Sampat, Shailaja Keyur, Kumar, Akshay, Yang, Yezhou, Baral, Chitta

Most existing research on visual question answering (VQA) is limited to information explicitly present in an image or a video. In this paper, we take visual understanding to a higher level where systems are challenged to answer questions that involve

Externí odkaz: http://arxiv.org/abs/2104.05981

Zobrazit plný text záznamu

Report

'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks

Autor: Luo, Man, Sampat, Shailaja Keyur, Tallman, Riley, Zeng, Yankai, Vancha, Manuha, Sajja, Akarshan, Baral, Chitta

GQA~\citep{hudson2019gqa} is a dataset for real-world visual reasoning and compositional question answering. We found that many answers predicted by the best vision-language models on the GQA dataset do not match the ground-truth answer but still are

Externí odkaz: http://arxiv.org/abs/2103.15022

Zobrazit plný text záznamu

Report

Visuo-Linguistic Question Answering (VLQA) Challenge

Autor: Sampat, Shailaja Keyur, Yang, Yezhou, Baral, Chitta

Understanding images and text together is an important aspect of cognition and building advanced Artificial Intelligence (AI) systems. As a community, we have achieved good benchmarks over language and vision domains separately, however joint reasoni

Externí odkaz: http://arxiv.org/abs/2005.00330

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání