Zobrazeno 1 - 10
of 30
pro vyhledávání: '"Kafle, Kushal"'
Autor:
Lu, Jian, Srivastava, Shikhar, Chen, Junyu, Shrestha, Robik, Acharya, Manoj, Kafle, Kushal, Kanan, Christopher
With the advent of multi-modal large language models (MLLMs), datasets used for visual question answering (VQA) and referring expression comprehension have seen a resurgence. However, the most popular datasets used to evaluate MLLMs are some of the e
Externí odkaz:
http://arxiv.org/abs/2408.05334
Vision Language Models (VLMs) such as CLIP are powerful models; however they can exhibit unwanted biases, making them less safe when deployed directly in applications such as text-to-image, text-to-video retrievals, reverse search, or classification
Externí odkaz:
http://arxiv.org/abs/2406.11331
Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on the origi
Externí odkaz:
http://arxiv.org/abs/2404.16123
Autor:
Hua, Hang, Shi, Jing, Kafle, Kushal, Jenni, Simon, Zhang, Daoan, Collomosse, John, Cohen, Scott, Luo, Jiebo
Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoni
Externí odkaz:
http://arxiv.org/abs/2404.14715
We propose Subject-Conditional Relation Detection SCoRD, where conditioned on an input subject, the goal is to predict all its relations to other objects in a scene along with their locations. Based on the Open Images dataset, we propose a challengin
Externí odkaz:
http://arxiv.org/abs/2308.12910
We propose a margin-based loss for tuning joint vision-language models so that their gradient-based explanations are consistent with region-level annotations provided by humans for relatively smaller grounding datasets. We refer to this objective as
Externí odkaz:
http://arxiv.org/abs/2206.15462
Dataset bias and spurious correlations can significantly impair generalization in deep neural networks. Many prior efforts have addressed this problem using either alternative loss functions or sampling strategies that focus on rare patterns. We prop
Externí odkaz:
http://arxiv.org/abs/2204.02426
Autor:
Pham, Khoi, Kafle, Kushal, Lin, Zhe, Ding, Zhihong, Cohen, Scott, Tran, Quan, Shrivastava, Abhinav
Visual attributes constitute a large portion of information contained in a scene. Objects can be described using a wide variety of attributes which portray their visual appearance (color, texture), geometry (shape, size, posture), and other intrinsic
Externí odkaz:
http://arxiv.org/abs/2106.09707
A critical problem in deep learning is that systems learn inappropriate biases, resulting in their inability to perform well on minority groups. This has led to the creation of multiple algorithms that endeavor to mitigate bias. However, it is not cl
Externí odkaz:
http://arxiv.org/abs/2104.00170
Autor:
Teney, Damien, Kafle, Kushal, Shrestha, Robik, Abbasnejad, Ehsan, Kanan, Christopher, Hengel, Anton van den
Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels b
Externí odkaz:
http://arxiv.org/abs/2005.09241