Výsledky vyhledávání - "Gupta, Ankush"

Report

BootsTAP: Bootstrapped Training for Tracking-Any-Point

Autor: Doersch, Carl, Luc, Pauline, Yang, Yi, Gokay, Dilara, Koppula, Skanda, Gupta, Ankush, Heyward, Joseph, Rocco, Ignacio, Goroshin, Ross, Carreira, João, Zisserman, Andrew

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any

Externí odkaz: http://arxiv.org/abs/2402.00847

Zobrazit plný text záznamu

Report

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

Autor: Zhang, Chuhan, Gupta, Ankush, Zisserman, Andrew

We introduce an object-aware decoder for improving the performance of spatio-temporal representations on ego-centric videos. The key idea is to enhance object-awareness during training by tasking the model to predict hand positions, object positions,

Externí odkaz: http://arxiv.org/abs/2308.07918

Zobrazit plný text záznamu

Report

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Autor: Doersch, Carl, Yang, Yi, Vecerik, Mel, Gokay, Dilara, Gupta, Ankush, Aytar, Yusuf, Carreira, Joao, Zisserman, Andrew

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candida

Externí odkaz: http://arxiv.org/abs/2306.08637

Zobrazit plný text záznamu

Report

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational task

Externí odkaz: http://arxiv.org/abs/2305.13786

Zobrazit plný text záznamu

Dissertation/ Thesis

Putting cavitation to work: applications of strongly collapsing bubbles

Autor: Gupta, Ankush

Strongly collapsing bubbles, whose presence and activity are often conveniently captured by the word ‘cavitation’, can produce profound effects in the medium and mechanisms in which they are produced. The word ‘cavitation’ then implies bubble

Externí odkaz: https://hdl.handle.net/2144/45054

Zobrazit plný text záznamu

Report

SuS-X: Training-Free Name-Only Transfer of Vision-Language Models

Autor: Udandarao, Vishaal, Gupta, Ankush, Albanie, Samuel

Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet effective way to train large-scale vision-language models. CLIP demonstrates impressive zero-shot classification and retrieval on diverse downstream tasks. However, to leverag

Externí odkaz: http://arxiv.org/abs/2211.16198

Zobrazit plný text záznamu

Report

TAP-Vid: A Benchmark for Tracking Any Point in a Video

Autor: Doersch, Carl, Gupta, Ankush, Markeeva, Larisa, Recasens, Adrià, Smaira, Lucas, Aytar, Yusuf, Carreira, João, Zisserman, Andrew, Yang, Yi

Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the p

Externí odkaz: http://arxiv.org/abs/2211.03726

Zobrazit plný text záznamu

Report

Is an Object-Centric Video Representation Beneficial for Transfer?

Autor: Zhang, Chuhan, Gupta, Ankush, Zisserman, Andrew

The objective of this work is to learn an object-centric video representation, with the aim of improving transferability to novel tasks, i.e., tasks different from the pre-training task of action classification. To this end, we introduce a new object

Externí odkaz: http://arxiv.org/abs/2207.10075

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání