Zobrazeno 1 - 10
of 3 781
pro vyhledávání: '"Gupta Vivek"'
Vision-language models (VLMs) excel at tasks requiring joint understanding of visual and linguistic information. A particularly promising yet under-explored application for these models lies in answering questions based on various kinds of maps. This
Externí odkaz:
http://arxiv.org/abs/2409.00255
Autor:
Mathur, Suyash Vardhan, Bafna, Jainit Sushil, Kartik, Kunal, Khandelwal, Harshita, Shrivastava, Manish, Gupta, Vivek, Bansal, Mohit, Roth, Dan
Existing datasets for tabular question answering typically focus exclusively on text within cells. However, real-world data is inherently multimodal, often blending images such as symbols, faces, icons, patterns, and charts with textual content in ta
Externí odkaz:
http://arxiv.org/abs/2408.13860
Autor:
Hoffmann, Jordan, James, Clancy W., Qiu, Hao, Glowacki, Marcin, Bannister, Keith W., Gupta, Vivek, Prochaska, Jason X., Bera, Apurba, Deller, Adam T., Gourdji, Kelly, Marnoch, Lachlan, Ryder, Stuart D., Scott, Danica R., Shannon, Ryan M., Tejos, Nicolas
Fast radio bursts (FRBs) are transient radio signals of extragalactic origins that are subjected to propagation effects such as dispersion and scattering. It follows then that these signals hold information regarding the medium they have traversed an
Externí odkaz:
http://arxiv.org/abs/2408.05937
Temporal reasoning over tabular data presents substantial challenges for large language models (LLMs), as evidenced by recent research. In this study, we conduct a comprehensive analysis of temporal datasets to pinpoint the specific limitations of LL
Externí odkaz:
http://arxiv.org/abs/2407.16030
Autor:
Mukhopadhyay, Srija, Qidwai, Adnan, Garimella, Aparna, Ramu, Pritika, Gupta, Vivek, Roth, Dan
Chart question answering (CQA) is a crucial area of Visual Language Understanding. However, the robustness and consistency of current Visual Language Models (VLMs) in this field remain under-explored. This paper evaluates state-of-the-art VLMs on com
Externí odkaz:
http://arxiv.org/abs/2407.11229
Cognitive textual and visual reasoning tasks, such as puzzles, series, and analogies, demand the ability to quickly reason, decipher, and evaluate patterns both textually and spatially. While LLMs and VLMs, through extensive training on large amounts
Externí odkaz:
http://arxiv.org/abs/2407.10380
Tabular reasoning involves interpreting natural language queries about tabular data, which presents a unique challenge of combining language understanding with structured data analysis. Existing methods employ either textual reasoning, which excels i
Externí odkaz:
http://arxiv.org/abs/2407.05952
Autor:
Singh, Shubhankar, Chaurasia, Purvi, Varun, Yerram, Pandya, Pranshu, Gupta, Vatsal, Gupta, Vivek, Roth, Dan
Existing benchmarks for visual question answering lack in visual grounding and complexity, particularly in evaluating spatial reasoning skills. We introduce FlowVQA, a novel benchmark aimed at assessing the capabilities of visual question-answering m
Externí odkaz:
http://arxiv.org/abs/2406.19237
To completely understand a document, the use of textual information is not enough. Understanding visual cues, such as layouts and charts, is also required. While the current state-of-the-art approaches for document understanding (both OCR-based and O
Externí odkaz:
http://arxiv.org/abs/2406.10085
Large Language Models (LLMs), excel in natural language understanding, but their capability for complex mathematical reasoning with an amalgamation of structured tables and unstructured text is uncertain. This study explores LLMs' mathematical reason
Externí odkaz:
http://arxiv.org/abs/2402.11194