Výsledky vyhledávání

Report

TAPVid-3D: A Benchmark for Tracking Any Point in 3D

Autor: Koppula, Skanda, Rocco, Ignacio, Yang, Yi, Heyward, Joe, Carreira, João, Zisserman, Andrew, Brostow, Gabriel, Doersch, Carl

We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). While point tracking in two dimensions (TAP) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS, three

Externí odkaz: http://arxiv.org/abs/2407.05921

Zobrazit plný text záznamu

Report

Memory Consolidation Enables Long-Context Video Understanding

Autor: Balažević, Ivana, Shi, Yuge, Papalampidi, Pinelopi, Chaabouni, Rahma, Koppula, Skanda, Hénaff, Olivier J.

Most transformer-based video encoders are limited to short temporal contexts due to their quadratic complexity. While various attempts have been made to extend this context, this has often come at the cost of both conceptual and computational complex

Externí odkaz: http://arxiv.org/abs/2402.05861

Zobrazit plný text záznamu

Report

BootsTAP: Bootstrapped Training for Tracking-Any-Point

Autor: Doersch, Carl, Luc, Pauline, Yang, Yi, Gokay, Dilara, Koppula, Skanda, Gupta, Ankush, Heyward, Joseph, Rocco, Ignacio, Goroshin, Ross, Carreira, João, Zisserman, Andrew

To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any

Externí odkaz: http://arxiv.org/abs/2402.00847

Zobrazit plný text záznamu

Report

Quantum Polynomial Hierarchies: Karp-Lipton, error reduction, and lower bounds

Autor: Agarwal, Avantika, Gharibian, Sevag, Koppula, Venkata, Rudolph, Dorian

Publikováno v: 49th International Symposium on Mathematical Foundations of Computer Science (MFCS 2024)

The Polynomial-Time Hierarchy ($\mathsf{PH}$) is a staple of classical complexity theory, with applications spanning randomized computation to circuit lower bounds to ''quantum advantage'' analyses for near-term quantum computers. Quantumly, however,

Externí odkaz: http://arxiv.org/abs/2401.01633

Zobrazit plný text záznamu

Report

A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames

Autor: Papalampidi, Pinelopi, Koppula, Skanda, Pathak, Shreya, Chiu, Justin, Heyward, Joe, Patraucean, Viorica, Shen, Jiajun, Miech, Antoine, Zisserman, Andrew, Nematzdeh, Aida

Understanding long, real-world videos requires modeling of long-range visual dependencies. To this end, we explore video-first architectures, building on the common paradigm of transferring large-scale, image--text models to video via shallow tempora

Externí odkaz: http://arxiv.org/abs/2312.07395

Zobrazit plný text záznamu

Report

Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

Autor: Su, Hsuan, Hu, Ting-Yao, Koppula, Hema Swetha, Vemulapalli, Raviteja, Chang, Jen-Hao Rick, Yang, Karren, Mantena, Gautam Varma, Tuzel, Oncel

While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readil

Externí odkaz: http://arxiv.org/abs/2309.10707

Zobrazit plný text záznamu

Report

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational task

Externí odkaz: http://arxiv.org/abs/2305.13786

Zobrazit plný text záznamu

Report

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

Autor: Sharma, Mohit, Fantacci, Claudio, Zhou, Yuxiang, Koppula, Skanda, Heess, Nicolas, Scholz, Jon, Aytar, Yusuf

Recent works have shown that large models pretrained on common visual learning tasks can provide useful representations for a wide range of specialized perception problems, as well as a variety of robotic manipulation tasks. While prior work on robot

Externí odkaz: http://arxiv.org/abs/2304.06600

Zobrazit plný text záznamu

Report

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Autor: Yang, Karren, Hu, Ting-Yao, Chang, Jen-Hao Rick, Koppula, Hema Swetha, Tuzel, Oncel

Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, w

Externí odkaz: http://arxiv.org/abs/2303.14885

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání