Zobrazeno 1 - 10
of 80
pro vyhledávání: '"Sahu, Pritish"'
Large Visual Language Models (LVLMs) struggle with hallucinations in visual instruction following task(s), limiting their trustworthiness and real-world applicability. We propose Pelican -- a novel framework designed to detect and mitigate hallucinat
Externí odkaz:
http://arxiv.org/abs/2407.02352
If a Large Language Model (LLM) answers "yes" to the question "Are mountains tall?" then does it know what a mountain is? Can you rely on it responding correctly or incorrectly to other questions about mountains? The success of Large Language Models
Externí odkaz:
http://arxiv.org/abs/2209.15093
We present a novel computational model, "SAViR-T", for the family of visual reasoning problems embodied in the Raven's Progressive Matrices (RPM). Our model considers explicit spatial semantics of visual elements within each image in the puzzle, enco
Externí odkaz:
http://arxiv.org/abs/2206.09265
We focus on Multimodal Machine Reading Comprehension (M3C) where a model is expected to answer questions based on given passage (or context), and the context and the questions can be in different modalities. Previous works such as RecipeQA have propo
Externí odkaz:
http://arxiv.org/abs/2110.11899
Computational learning approaches to solving visual reasoning tests, such as Raven's Progressive Matrices (RPM), critically depend on the ability to identify the visual concepts used in the test (i.e., the representation) as well as the latent rules
Externí odkaz:
http://arxiv.org/abs/2109.13156
Publikováno v:
In Marine Pollution Bulletin June 2024 203
Publikováno v:
In Heliyon 15 May 2024 10(9)
Current pre-trained language models have lots of knowledge, but a more limited ability to use that knowledge. Bloom's Taxonomy helps educators teach children how to use knowledge by categorizing comprehension skills, so we use it to analyze and impro
Externí odkaz:
http://arxiv.org/abs/2106.04653
This paper targets the problem of procedural multimodal machine comprehension (M3C). This task requires an AI to comprehend given steps of multimodal instructions and then answer questions. Compared to vanilla machine comprehension tasks where an AI
Externí odkaz:
http://arxiv.org/abs/2104.10139
Autor:
Sikka, Karan, Huang, Jihua, Silberfarb, Andrew, Nayak, Prateeth, Rohrer, Luke, Sahu, Pritish, Byrnes, John, Divakaran, Ajay, Rohwer, Richard
We improve zero-shot learning (ZSL) by incorporating common-sense knowledge in DNNs. We propose Common-Sense based Neuro-Symbolic Loss (CSNL) that formulates prior knowledge as novel neuro-symbolic loss functions that regularize visual-semantic embed
Externí odkaz:
http://arxiv.org/abs/2011.10889