Zobrazeno 1 - 10
of 71
pro vyhledávání: '"Sajjadi, Mehdi"'
Autor:
Carreira, João, Gokay, Dilara, King, Michael, Zhang, Chuhan, Rocco, Ignacio, Mahendran, Aravindh, Keck, Thomas Albert, Heyward, Joseph, Koppula, Skanda, Pot, Etienne, Erdogan, Goker, Hasson, Yana, Yang, Yi, Greff, Klaus, Moing, Guillaume Le, van Steenkiste, Sjoerd, Zoran, Daniel, Hudson, Drew A., Vélez, Pedro, Polanía, Luisa, Friedman, Luke, Duvarney, Chris, Goroshin, Ross, Allen, Kelsey, Walker, Jacob, Kabra, Rishabh, Aboussouan, Eric, Sun, Jennifer, Kipf, Thomas, Doersch, Carl, Pătrăucean, Viorica, Damen, Dima, Luc, Pauline, Sajjadi, Mehdi S. M., Zisserman, Andrew
Scaling has not yet been convincingly demonstrated for pure self-supervised learning from video. However, prior work has focused evaluations on semantic-related tasks $\unicode{x2013}$ action classification, ImageNet classification, etc. In this pape
Externí odkaz:
http://arxiv.org/abs/2412.15212
Autor:
Pătrăucean, Viorica, He, Xu Owen, Heyward, Joseph, Zhang, Chuhan, Sajjadi, Mehdi S. M., Muraru, George-Cristian, Zholus, Artem, Karami, Mahdi, Goroshin, Ross, Chen, Yutian, Osindero, Simon, Carreira, João, Pascanu, Razvan
We propose a novel block for video modelling. It relies on a time-space-channel factorisation with dedicated blocks for each dimension: gated linear recurrent units (LRUs) perform information mixing over time, self-attention layers perform mixing ove
Externí odkaz:
http://arxiv.org/abs/2412.14294
Autor:
van Steenkiste, Sjoerd, Zoran, Daniel, Yang, Yi, Rubanova, Yulia, Kabra, Rishabh, Doersch, Carl, Gokay, Dilara, Heyward, Joseph, Pot, Etienne, Greff, Klaus, Hudson, Drew A., Keck, Thomas Albert, Carreira, Joao, Dosovitskiy, Alexey, Sajjadi, Mehdi S. M., Kipf, Thomas
Current vision models typically maintain a fixed correspondence between their representation structure and image space. Each layer comprises a set of tokens arranged "on-the-grid," which biases patches or tokens to encode information at a specific sp
Externí odkaz:
http://arxiv.org/abs/2411.05927
Autor:
Seitzer, Maximilian, van Steenkiste, Sjoerd, Kipf, Thomas, Greff, Klaus, Sajjadi, Mehdi S. M.
Visual understanding of the world goes beyond the semantics and flat structure of individual images. In this work, we aim to capture both the 3D structure and dynamics of real-world scenes from monocular real-world videos. Our Dynamic Scene Transform
Externí odkaz:
http://arxiv.org/abs/2310.06020
Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of in
Externí odkaz:
http://arxiv.org/abs/2306.08068
Autor:
Zimmermann, Roland S., van Steenkiste, Sjoerd, Sajjadi, Mehdi S. M., Kipf, Thomas, Greff, Klaus
Self-supervised methods for learning object-centric representations have recently been applied successfully to various datasets. This progress is largely fueled by slot-based methods, whose ability to cluster visual scenes into meaningful objects hol
Externí odkaz:
http://arxiv.org/abs/2305.18890
The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a re
Externí odkaz:
http://arxiv.org/abs/2304.00947
Autor:
Pham, Trung, Maghoumi, Mehran, Jiang, Wanli, Jujjavarapu, Bala Siva Sashank, Sajjadi, Mehdi, Liu, Xin, Lin, Hsuan-Chu, Chen, Bor-Jeng, Truong, Giang, Fang, Chao, Kwon, Junghyun, Park, Minwoo
Robust, real-time perception of 3D world is essential to the autonomous vehicle. We introduce an end-to-end surround camera perception system, named NVAutoNet, for self-driving. NVAutoNet is a multi-task, multi-camera network which takes a variable s
Externí odkaz:
http://arxiv.org/abs/2303.12976
Autor:
Driess, Danny, Xia, Fei, Sajjadi, Mehdi S. M., Lynch, Corey, Chowdhery, Aakanksha, Ichter, Brian, Wahid, Ayzaan, Tompson, Jonathan, Vuong, Quan, Yu, Tianhe, Huang, Wenlong, Chebotar, Yevgen, Sermanet, Pierre, Duckworth, Daniel, Levine, Sergey, Vanhoucke, Vincent, Hausman, Karol, Toussaint, Marc, Greff, Klaus, Zeng, Andy, Mordatch, Igor, Florence, Pete
Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding. We propose embodied language models to directly incorporate real-worl
Externí odkaz:
http://arxiv.org/abs/2303.03378
Autor:
Biza, Ondrej, van Steenkiste, Sjoerd, Sajjadi, Mehdi S. M., Elsayed, Gamaleldin F., Mahendran, Aravindh, Kipf, Thomas
Automatically discovering composable abstractions from raw perceptual data is a long-standing challenge in machine learning. Recent slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress in this di
Externí odkaz:
http://arxiv.org/abs/2302.04973