Zobrazeno 1 - 10
of 614
pro vyhledávání: '"ROTH, DAN"'
Vision-language models (VLMs) excel at tasks requiring joint understanding of visual and linguistic information. A particularly promising yet under-explored application for these models lies in answering questions based on various kinds of maps. This
Externí odkaz:
http://arxiv.org/abs/2409.00255
Autor:
Mathur, Suyash Vardhan, Bafna, Jainit Sushil, Kartik, Kunal, Khandelwal, Harshita, Shrivastava, Manish, Gupta, Vivek, Bansal, Mohit, Roth, Dan
Existing datasets for tabular question answering typically focus exclusively on text within cells. However, real-world data is inherently multimodal, often blending images such as symbols, faces, icons, patterns, and charts with textual content in ta
Externí odkaz:
http://arxiv.org/abs/2408.13860
Temporal reasoning over tabular data presents substantial challenges for large language models (LLMs), as evidenced by recent research. In this study, we conduct a comprehensive analysis of temporal datasets to pinpoint the specific limitations of LL
Externí odkaz:
http://arxiv.org/abs/2407.16030
Autor:
Mukhopadhyay, Srija, Qidwai, Adnan, Garimella, Aparna, Ramu, Pritika, Gupta, Vivek, Roth, Dan
Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on com
Externí odkaz:
http://arxiv.org/abs/2407.11229
Cognitive textual and visual reasoning tasks, such as puzzles, series, and analogies, demand the ability to quickly reason, decipher, and evaluate patterns both textually and spatially. While LLMs and VLMs, through extensive training on large amounts
Externí odkaz:
http://arxiv.org/abs/2407.10380
Multi-Instance Partial Label Learning (MI-PLL) is a weakly-supervised learning setting encompassing partial label learning, latent structural learning, and neurosymbolic learning. Differently from supervised learning, in MI-PLL, the inputs to the cla
Externí odkaz:
http://arxiv.org/abs/2407.10000
Tabular reasoning involves interpreting unstructured queries against structured tables, requiring a synthesis of textual understanding and symbolic reasoning. Existing methods rely on either of the approaches and are constrained by their respective l
Externí odkaz:
http://arxiv.org/abs/2407.05952
Autor:
Singh, Shubhankar, Chaurasia, Purvi, Varun, Yerram, Pandya, Pranshu, Gupta, Vatsal, Gupta, Vivek, Roth, Dan
Existing benchmarks for visual question answering lack in visual grounding and complexity, particularly in evaluating spatial reasoning skills. We introduce FlowVQA, a novel benchmark aimed at assessing the capabilities of visual question-answering m
Externí odkaz:
http://arxiv.org/abs/2406.19237
Language models have shown impressive in-context-learning capabilities, which allow them to benefit from input prompts and perform better on downstream end tasks. Existing works investigate the mechanisms behind this observation, and propose label-ag
Externí odkaz:
http://arxiv.org/abs/2406.11243
Autor:
Jiang, Bowen, Xie, Yangxinyu, Hao, Zhuoqun, Wang, Xiaomeng, Mallick, Tanwi, Su, Weijie J., Taylor, Camillo J., Roth, Dan
This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their t
Externí odkaz:
http://arxiv.org/abs/2406.11050