Zobrazeno 1 - 10
of 196
pro vyhledávání: '"CARREIRA, João"'
Autor:
Carreira, João, Gokay, Dilara, King, Michael, Zhang, Chuhan, Rocco, Ignacio, Mahendran, Aravindh, Keck, Thomas Albert, Heyward, Joseph, Koppula, Skanda, Pot, Etienne, Erdogan, Goker, Hasson, Yana, Yang, Yi, Greff, Klaus, Moing, Guillaume Le, van Steenkiste, Sjoerd, Zoran, Daniel, Hudson, Drew A., Vélez, Pedro, Polanía, Luisa, Friedman, Luke, Duvarney, Chris, Goroshin, Ross, Allen, Kelsey, Walker, Jacob, Kabra, Rishabh, Aboussouan, Eric, Sun, Jennifer, Kipf, Thomas, Doersch, Carl, Pătrăucean, Viorica, Damen, Dima, Luc, Pauline, Sajjadi, Mehdi S. M., Zisserman, Andrew
Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations on semantic-related tasks $\unicode{x2013}$ action classification, ImageNet classification, etc. In this pape
Externí odkaz:
http://arxiv.org/abs/2412.15212
Autor:
Pătrăucean, Viorica, He, Xu Owen, Heyward, Joseph, Zhang, Chuhan, Sajjadi, Mehdi S. M., Muraru, George-Cristian, Zholus, Artem, Karami, Mahdi, Goroshin, Ross, Chen, Yutian, Osindero, Simon, Carreira, João, Pascanu, Razvan
We propose a novel block for video modelling. It relies on a time-space-channel factorisation with dedicated blocks for each dimension: gated linear recurrent units (LRUs) perform information mixing over time, self-attention layers perform mixing ove
Externí odkaz:
http://arxiv.org/abs/2412.14294
We introduce a hierarchical probabilistic approach to go from a 2D image to multiview 3D: a diffusion "prior" models the unseen 3D geometry, which then conditions a diffusion "decoder" to generate novel views of the subject. We use a pointmap-based g
Externí odkaz:
http://arxiv.org/abs/2412.10273
Following the successful 2023 edition, we organised the Second Perception Test challenge as a half-day workshop alongside the IEEE/CVF European Conference on Computer Vision (ECCV) 2024, with the goal of benchmarking state-of-the-art video models and
Externí odkaz:
http://arxiv.org/abs/2411.19941
Autor:
van Steenkiste, Sjoerd, Zoran, Daniel, Yang, Yi, Rubanova, Yulia, Kabra, Rishabh, Doersch, Carl, Gokay, Dilara, Heyward, Joseph, Pot, Etienne, Greff, Klaus, Hudson, Drew A., Keck, Thomas Albert, Carreira, Joao, Dosovitskiy, Alexey, Sajjadi, Mehdi S. M., Kipf, Thomas
Current vision models typically maintain a fixed correspondence between their representation structure and image space. Each layer comprises a set of tokens arranged "on-the-grid," which biases patches or tokens to encode information at a specific sp
Externí odkaz:
http://arxiv.org/abs/2411.05927
Autor:
Koppula, Skanda, Rocco, Ignacio, Yang, Yi, Heyward, Joe, Carreira, João, Zisserman, Andrew, Brostow, Gabriel, Doersch, Carl
We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). While point tracking in two dimensions (TAP) has many benchmarks measuring performance on real-world videos, such as TAPVid-DAVIS, three
Externí odkaz:
http://arxiv.org/abs/2407.05921
Autor:
Doersch, Carl, Luc, Pauline, Yang, Yi, Gokay, Dilara, Koppula, Skanda, Gupta, Ankush, Heyward, Joseph, Rocco, Ignacio, Goroshin, Ross, Carreira, João, Zisserman, Andrew
To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any
Externí odkaz:
http://arxiv.org/abs/2402.00847
The First Perception Test challenge was held as a half-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2023, with the goal of benchmarking state-of-the-art video models on the recently proposed Perception Test b
Externí odkaz:
http://arxiv.org/abs/2312.13090
Autor:
Carreira, João, King, Michael, Pătrăucean, Viorica, Gokay, Dilara, Ionescu, Cătălin, Yang, Yi, Zoran, Daniel, Heyward, Joseph, Doersch, Carl, Aytar, Yusuf, Damen, Dima, Zisserman, Andrew
We introduce a framework for online learning from a single continuous video stream -- the way people and animals learn, without mini-batches, data augmentation or shuffling. This poses great challenges given the high correlation between consecutive v
Externí odkaz:
http://arxiv.org/abs/2312.00598
Autor:
Venkataramanan, Shashanka, Rizve, Mamshad Nayeem, Carreira, João, Asano, Yuki M., Avrithis, Yannis
Self-supervised learning has unlocked the potential of scaling up pretraining to billions of images, since annotation is unnecessary. But are we making the best use of data? How more economical can we be? In this work, we attempt to answer this quest
Externí odkaz:
http://arxiv.org/abs/2310.08584