Zobrazeno 1 - 10
of 526
pro vyhledávání: '"Gupta Ankush"'
Autor:
Doersch, Carl, Luc, Pauline, Yang, Yi, Gokay, Dilara, Koppula, Skanda, Gupta, Ankush, Heyward, Joseph, Rocco, Ignacio, Goroshin, Ross, Carreira, João, Zisserman, Andrew
To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any
Externí odkaz:
http://arxiv.org/abs/2402.00847
We introduce an object-aware decoder for improving the performance of spatio-temporal representations on ego-centric videos. The key idea is to enhance object-awareness during training by tasking the model to predict hand positions, object positions,
Externí odkaz:
http://arxiv.org/abs/2308.07918
Autor:
Doersch, Carl, Yang, Yi, Vecerik, Mel, Gokay, Dilara, Gupta, Ankush, Aytar, Yusuf, Carreira, Joao, Zisserman, Andrew
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candida
Externí odkaz:
http://arxiv.org/abs/2306.08637
Autor:
Pătrăucean, Viorica, Smaira, Lucas, Gupta, Ankush, Continente, Adrià Recasens, Markeeva, Larisa, Banarse, Dylan, Koppula, Skanda, Heyward, Joseph, Malinowski, Mateusz, Yang, Yi, Doersch, Carl, Matejovicova, Tatiana, Sulsky, Yury, Miech, Antoine, Frechette, Alex, Klimczak, Hanna, Koster, Raphael, Zhang, Junlin, Winkler, Stephanie, Aytar, Yusuf, Osindero, Simon, Damen, Dima, Zisserman, Andrew, Carreira, João
We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational task
Externí odkaz:
http://arxiv.org/abs/2305.13786
Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models. CLIP demonstrates impressive zero-shot classification and retrieval on diverse downstream tasks. However, to leverag
Externí odkaz:
http://arxiv.org/abs/2211.16198
Autor:
Doersch, Carl, Gupta, Ankush, Markeeva, Larisa, Recasens, Adrià, Smaira, Lucas, Aytar, Yusuf, Carreira, João, Zisserman, Andrew, Yang, Yi
Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the p
Externí odkaz:
http://arxiv.org/abs/2211.03726
The objective of this work is to learn an object-centric video representation, with the aim of improving transferability to novel tasks, i.e., tasks different from the pre-training task of action classification. To this end, we introduce a new object
Externí odkaz:
http://arxiv.org/abs/2207.10075
Autor:
Gupta, Ankush, Suhag, Sathans
Publikováno v:
In Sustainable Materials and Technologies September 2024 41
Autor:
Jaiswal, Lav Kumar, Singh, Rakesh Kumar, Nayak, Tanmayee, Kakkar, Anuja, Kandwal, Garima, Singh, Vijay Shankar, Gupta, Ankush
Publikováno v:
In Infection, Genetics and Evolution September 2024 123
Autor:
Kakkar, Anuja, Kandwal, Garima, Nayak, Tanmayee, Jaiswal, Lav Kumar, Srivastava, Amit, Gupta, Ankush
Publikováno v:
In Heliyon 30 July 2024 10(14)