Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Lal, Shamit"'
Autor:
Li, Hao, Lal, Shamit, Li, Zhiheng, Xie, Yusheng, Wang, Ying, Zou, Yang, Majumder, Orchid, Manmatha, R., Tu, Zhuowen, Ermon, Stefano, Soatto, Stefano, Swaminathan, Ashwin
We empirically study the scaling properties of various Diffusion Transformers (DiTs) for text-to-image generation by performing extensive and rigorous ablations, including training scaled DiTs ranging from 0.3B upto 8B parameters on datasets up to 60
Externí odkaz:
http://arxiv.org/abs/2412.12391
Visual odometry (VO) and SLAM have been using multi-view geometry via local structure from motion for decades. These methods have a slight disadvantage in challenging scenarios such as low-texture images, dynamic scenarios, etc. Meanwhile, use of dee
Externí odkaz:
http://arxiv.org/abs/2309.04147
This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos, agnostic to object and scene semantic content, and evaluates the resulting scene representations in the downstream tasks of
Externí odkaz:
http://arxiv.org/abs/2104.03851
Autor:
Xian, Zhou, Lal, Shamit, Tung, Hsiao-Yu, Platanios, Emmanouil Antonios, Fragkiadaki, Katerina
We propose HyperDynamics, a dynamics meta-learning framework that conditions on an agent's interactions with the environment and optionally its visual observations, and generates the parameters of neural dynamics models based on inferred properties o
Externí odkaz:
http://arxiv.org/abs/2103.09439
We propose an action-conditioned dynamics model that predicts scene changes caused by object and agent interactions in a viewpoint-invariant 3D neural scene representation space, inferred from RGB-D videos. In this 3D feature space, objects do not in
Externí odkaz:
http://arxiv.org/abs/2011.06464
Autor:
Prabhudesai, Mihir, Lal, Shamit, Patil, Darshan, Tung, Hsiao-Yu, Harley, Adam W, Fragkiadaki, Katerina
We present neural architectures that disentangle RGB-D images into objects' shapes and styles and a map of the background scene, and explore their applications for few-shot 3D object detection and few-shot concept classification. Our networks incorpo
Externí odkaz:
http://arxiv.org/abs/2011.03367
Autor:
Prabhudesai, Mihir, Lal, Shamit, Tung, Hsiao-Yu Fish, Harley, Adam W., Potdar, Shubhankar, Fragkiadaki, Katerina
We propose a system that learns to detect objects and infer their 3D poses in RGB-D images. Many existing systems can identify objects and infer 3D poses, but they heavily rely on human labels and 3D annotations. The challenge here is to achieve this
Externí odkaz:
http://arxiv.org/abs/2010.16279