Zobrazeno 1 - 10
of 47
pro vyhledávání: '"Cogswell, Michael"'
Autor:
Gong, Yunye, Shrestha, Robik, Claypoole, Jared, Cogswell, Michael, Ray, Arijit, Kanan, Christopher, Divakaran, Ajay
We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks. Unlike current benchmarks that often focus on fact-based memorization and simple reasoning tasks without theoreti
Externí odkaz:
http://arxiv.org/abs/2312.12716
Autor:
Gwilliam, Matthew, Cogswell, Michael, Ye, Meng, Sikka, Karan, Shrivastava, Abhinav, Divakaran, Ajay
Existing long video retrieval systems are trained and tested in the paragraph-to-video retrieval regime, where every long video is described by a single long paragraph. This neglects the richness and variety of possible valid descriptions of a video,
Externí odkaz:
http://arxiv.org/abs/2312.00115
We present DRESS, a large vision language model (LVLM) that innovatively exploits Natural Language feedback (NLF) from Large Language Models to enhance its alignment and interactions by addressing two key limitations in the state-of-the-art LVLMs. Fi
Externí odkaz:
http://arxiv.org/abs/2311.10081
Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can parse natural queries about the visual content and generate human-like outputs. In this work, we explore the ability of these models to demonstrate
Externí odkaz:
http://arxiv.org/abs/2309.04461
Autor:
Schiappa, Madeline, Abdullah, Raiyaan, Azad, Shehreen, Claypoole, Jared, Cogswell, Michael, Divakaran, Ajay, Rawat, Yogesh
In recent years large visual-language (V+L) models have achieved great success in various downstream tasks. However, it is not well studied whether these models have a conceptual grasp of the visual content. In this work we focus on conceptual unders
Externí odkaz:
http://arxiv.org/abs/2304.03659
If a Large Language Model (LLM) answers "yes" to the question "Are mountains tall?" then does it know what a mountain is? Can you rely on it responding correctly or incorrectly to other questions about mountains? The success of Large Language Models
Externí odkaz:
http://arxiv.org/abs/2209.15093
Despite their success and popularity, deep neural networks (DNNs) are vulnerable when facing backdoor attacks. This impedes their wider adoption, especially in mission critical applications. This paper tackles the problem of Trojan detection, namely,
Externí odkaz:
http://arxiv.org/abs/2110.08335
Autor:
Alipour, Kamran, Ray, Arijit, Lin, Xiao, Cogswell, Michael, Schulze, Jurgen P., Yao, Yi, Burachas, Giedrius T.
In the domain of Visual Question Answering (VQA), studies have shown improvement in users' mental model of the VQA system when they are exposed to examples of how these systems answer certain Image-Question (IQ) pairs. In this work, we show that show
Externí odkaz:
http://arxiv.org/abs/2110.06863
Current pre-trained language models have lots of knowledge, but a more limited ability to use that knowledge. Bloom's Taxonomy helps educators teach children how to use knowledge by categorizing comprehension skills, so we use it to analyze and impro
Externí odkaz:
http://arxiv.org/abs/2106.04653
Autor:
Ray, Arijit, Cogswell, Michael, Lin, Xiao, Alipour, Kamran, Divakaran, Ajay, Yao, Yi, Burachas, Giedrius
Attention maps, a popular heatmap-based explanation method for Visual Question Answering (VQA), are supposed to help users understand the model by highlighting portions of the image/question used by the model to infer answers. However, we see that us
Externí odkaz:
http://arxiv.org/abs/2103.14712