Zobrazeno 1 - 10
of 47 105
pro vyhledávání: '"A Shrivastava"'
Spatio-Temporal Scene Graphs (STSGs) provide a concise and expressive representation of dynamic scenes by modelling objects and their evolving relationships over time. However, real-world visual relationships often exhibit a long-tailed distribution,
Externí odkaz:
http://arxiv.org/abs/2411.13059
Recent advancements in vision-language models (VLMs) offer potential for robot task planning, but challenges remain due to VLMs' tendency to generate incorrect action sequences. To address these limitations, we propose VeriGraph, a novel framework th
Externí odkaz:
http://arxiv.org/abs/2411.10446
Training a policy that can generalize to unknown objects is a long standing challenge within the field of robotics. The performance of a policy often drops significantly in situations where an object in the scene was not seen during training. To solv
Externí odkaz:
http://arxiv.org/abs/2411.02482
Autor:
An, Chenyang, Imani, Shima, Yao, Feng, Dong, Chengyu, Abbasi, Ali, Shrivastava, Harsh, Buss, Samuel, Shang, Jingbo, Mahalingam, Gayathri, Sharma, Pramod, Diesendruck, Maurice
In the field of large language model (LLM)-based proof generation, despite being trained on extensive corpora such as OpenWebMath and Arxiv, these models still exhibit only modest performance on proving tasks of moderate difficulty. We believe that t
Externí odkaz:
http://arxiv.org/abs/2411.00863
We present LARP, a novel video tokenizer designed to overcome limitations in current video tokenization methods for autoregressive (AR) generative models. Unlike traditional patchwise tokenizers that directly encode local visual patches into discrete
Externí odkaz:
http://arxiv.org/abs/2410.21264
Autor:
Bhan, Nirav, Gupta, Shival, Manaswini, Sai, Baba, Ritik, Yadav, Narun, Desai, Hillori, Choudhary, Yash, Pawar, Aman, Shrivastava, Sarthak, Biswas, Sudipta
Large Language Models (LLMs) have shown remarkable capabilities in various domains, yet their economic impact has been limited by challenges in tool use and function calling. This paper introduces ThorV2, a novel architecture that significantly enhan
Externí odkaz:
http://arxiv.org/abs/2410.17950
We obtain $L^p-$estimates for the full and lacunary maximal functions associated to the twisted bilinear spherical averages given by \[\mathfrak{A}_t(f_1,f_2)(x,y)=\int_{\mathbb S^{2d-1}}f_1(x+tz_1,y)f_2(x,y+tz_2)\;d\sigma(z_1,z_2),\;t>0,\] for all d
Externí odkaz:
http://arxiv.org/abs/2410.17583
There is an increasing interest in using language models (LMs) for automated decision-making, with multiple countries actively testing LMs to aid in military crisis decision-making. To scrutinize relying on LM decision-making in high-stakes settings,
Externí odkaz:
http://arxiv.org/abs/2410.13204
Autoencoders have been used for finding interpretable and disentangled features underlying neural network representations in both image and text domains. While the efficacy and pitfalls of such methods are well-studied in vision, there is a lack of c
Externí odkaz:
http://arxiv.org/abs/2410.11767
Deep learning and advancements in contactless sensors have significantly enhanced our ability to understand complex human activities in healthcare settings. In particular, deep learning models utilizing computer vision have been developed to enable d
Externí odkaz:
http://arxiv.org/abs/2410.09339