Zobrazeno 1 - 10
of 319
pro vyhledávání: '"Shrivastava, Abhinav"'
Recent advancements in vision-language models (VLMs) offer potential for robot task planning, but challenges remain due to VLMs' tendency to generate incorrect action sequences. To address these limitations, we propose VeriGraph, a novel framework th
Externí odkaz:
http://arxiv.org/abs/2411.10446
Training a policy that can generalize to unknown objects is a long standing challenge within the field of robotics. The performance of a policy often drops significantly in situations where an object in the scene was not seen during training. To solv
Externí odkaz:
http://arxiv.org/abs/2411.02482
We present LARP, a novel video tokenizer designed to overcome limitations in current video tokenization methods for autoregressive (AR) generative models. Unlike traditional patchwise tokenizers that directly encode local visual patches into discrete
Externí odkaz:
http://arxiv.org/abs/2410.21264
Despite the abundant availability and content richness for video data, its high-dimensionality poses challenges for video research. Recent advancements have explored the implicit representation for videos using neural networks, demonstrating strong p
Externí odkaz:
http://arxiv.org/abs/2409.19429
Autor:
Swaminathan, Archana, Gupta, Anubhav, Gupta, Kamal, Maiya, Shishira R., Agarwal, Vatsal, Shrivastava, Abhinav
Neural Radiance Fields (NeRFs) have revolutionized the reconstruction of static scenes and objects in 3D, offering unprecedented quality. However, extending NeRFs to model dynamic objects or object articulations remains a challenging problem. Previou
Externí odkaz:
http://arxiv.org/abs/2409.06703
Implicit Neural Networks (INRs) have emerged as powerful representations to encode all forms of data, including images, videos, audios, and scenes. With video, many INRs for video have been proposed for the compression task, and recent methods featur
Externí odkaz:
http://arxiv.org/abs/2408.02672
We propose a simple yet effective approach for few-shot action recognition, emphasizing the disentanglement of motion and appearance representations. By harnessing recent progress in tracking, specifically point trajectories and self-supervised repre
Externí odkaz:
http://arxiv.org/abs/2407.18249
We propose WayEx, a new method for learning complex goal-conditioned robotics tasks from a single demonstration. Our approach distinguishes itself from existing imitation learning methods by demanding fewer expert examples and eliminating the need fo
Externí odkaz:
http://arxiv.org/abs/2407.15849
Autor:
Saini, Nirat, Bodla, Navaneeth, Shrivastava, Ashish, Ravichandran, Avinash, Zhang, Xiao, Shrivastava, Abhinav, Singh, Bharat
We introduce InVi, an approach for inserting or replacing objects within videos (referred to as inpainting) using off-the-shelf, text-to-image latent diffusion models. InVi targets controlled manipulation of objects and blending them seamlessly into
Externí odkaz:
http://arxiv.org/abs/2407.10958
Autor:
Levy, Mara, Shrivastava, Abhinav
Learning to represent three dimensional (3D) human pose given a two dimensional (2D) image of a person, is a challenging problem. In order to make the problem less ambiguous it has become common practice to estimate 3D pose in the camera coordinate s
Externí odkaz:
http://arxiv.org/abs/2407.07092