Výsledky vyhledávání - "Cogswell, Michael"

Report

BloomVQA: Assessing Hierarchical Multi-modal Comprehension

Autor: Gong, Yunye, Shrestha, Robik, Claypoole, Jared, Cogswell, Michael, Ray, Arijit, Kanan, Christopher, Divakaran, Ajay

We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks. Unlike current benchmarks that often focus on fact-based memorization and simple reasoning tasks without theoreti

Externí odkaz: http://arxiv.org/abs/2312.12716

Zobrazit plný text záznamu

Report

A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval

Autor: Gwilliam, Matthew, Cogswell, Michael, Ye, Meng, Sikka, Karan, Shrivastava, Abhinav, Divakaran, Ajay

Existing long video retrieval systems are trained and tested in the paragraph-to-video retrieval regime, where every long video is described by a single long paragraph. This neglects the richness and variety of possible valid descriptions of a video,

Externí odkaz: http://arxiv.org/abs/2312.00115

Zobrazit plný text záznamu

Report

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback

Autor: Chen, Yangyi, Sikka, Karan, Cogswell, Michael, Ji, Heng, Divakaran, Ajay

We present DRESS, a large vision language model (LVLM) that innovatively exploits Natural Language feedback (NLF) from Large Language Models to enhance its alignment and interactions by addressing two key limitations in the state-of-the-art LVLMs. Fi

Externí odkaz: http://arxiv.org/abs/2311.10081

Zobrazit plný text záznamu

Report

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models

Autor: Chen, Yangyi, Sikka, Karan, Cogswell, Michael, Ji, Heng, Divakaran, Ajay

Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can parse natural queries about the visual content and generate human-like outputs. In this work, we explore the ability of these models to demonstrate

Externí odkaz: http://arxiv.org/abs/2309.04461

Zobrazit plný text záznamu

Report

Probing Conceptual Understanding of Large Visual-Language Models

Autor: Schiappa, Madeline, Abdullah, Raiyaan, Azad, Shehreen, Claypoole, Jared, Cogswell, Michael, Divakaran, Ajay, Rawat, Yogesh

In recent years large visual-language (V+L) models have achieved great success in various downstream tasks. However, it is not well studied whether these models have a conceptual grasp of the visual content. In this work we focus on conceptual unders

Externí odkaz: http://arxiv.org/abs/2304.03659

Zobrazit plný text záznamu

Report

Unpacking Large Language Models with Conceptual Consistency

Autor: Sahu, Pritish, Cogswell, Michael, Gong, Yunye, Divakaran, Ajay

If a Large Language Model (LLM) answers "yes" to the question "Are mountains tall?" then does it know what a mountain is? Can you rely on it responding correctly or incorrectly to other questions about mountains? The success of Large Language Models

Externí odkaz: http://arxiv.org/abs/2209.15093

Zobrazit plný text záznamu

Report

Trigger Hunting with a Topological Prior for Trojan Detection

Autor: Hu, Xiaoling, Lin, Xiao, Cogswell, Michael, Yao, Yi, Jha, Susmit, Chen, Chao

Despite their success and popularity, deep neural networks (DNNs) are vulnerable when facing backdoor attacks. This impedes their wider adoption, especially in mission critical applications. This paper tackles the problem of Trojan detection, namely,

Externí odkaz: http://arxiv.org/abs/2110.08335

Zobrazit plný text záznamu

Report

Improving Users' Mental Model with Attention-directed Counterfactual Edits

Autor: Alipour, Kamran, Ray, Arijit, Lin, Xiao, Cogswell, Michael, Schulze, Jurgen P., Yao, Yi, Burachas, Giedrius T.

In the domain of Visual Question Answering (VQA), studies have shown improvement in users' mental model of the VQA system when they are exposed to examples of how these systems answer certain Image-Question (IQ) pairs. In this work, we show that show

Externí odkaz: http://arxiv.org/abs/2110.06863

Zobrazit plný text záznamu

Report

Comprehension Based Question Answering using Bloom's Taxonomy

Autor: Sahu, Pritish, Cogswell, Michael, Rutherford-Quach, Sara, Divakaran, Ajay

Current pre-trained language models have lots of knowledge, but a more limited ability to use that knowledge. Bloom's Taxonomy helps educators teach children how to use knowledge by categorizing comprehension skills, so we use it to analyze and impro

Externí odkaz: http://arxiv.org/abs/2106.04653

Zobrazit plný text záznamu

Report

Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models

Autor: Ray, Arijit, Cogswell, Michael, Lin, Xiao, Alipour, Kamran, Divakaran, Ajay, Yao, Yi, Burachas, Giedrius

Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), are supposed to help users understand the model by highlighting portions of the image/question used by the model to infer answers. However, we see that us

Externí odkaz: http://arxiv.org/abs/2103.14712

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání