Výsledky vyhledávání - "Sajjadi, Mehdi"

Report

Autor: Pătrăucean, Viorica, He, Xu Owen, Heyward, Joseph, Zhang, Chuhan, Sajjadi, Mehdi S. M., Muraru, George-Cristian, Zholus, Artem, Karami, Mahdi, Goroshin, Ross, Chen, Yutian, Osindero, Simon, Carreira, João, Pascanu, Razvan

We propose a novel block for video modelling. It relies on a time-space-channel factorisation with dedicated blocks for each dimension: gated linear recurrent units (LRUs) perform information mixing over time, self-attention layers perform mixing ove

Externí odkaz: http://arxiv.org/abs/2412.14294

Zobrazit plný text záznamu

Report

Moving Off-the-Grid: Scene-Grounded Video Representations

Autor: van Steenkiste, Sjoerd, Zoran, Daniel, Yang, Yi, Rubanova, Yulia, Kabra, Rishabh, Doersch, Carl, Gokay, Dilara, Heyward, Joseph, Pot, Etienne, Greff, Klaus, Hudson, Drew A., Keck, Thomas Albert, Carreira, Joao, Dosovitskiy, Alexey, Sajjadi, Mehdi S. M., Kipf, Thomas

Current vision models typically maintain a fixed correspondence between their representation structure and image space. Each layer comprises a set of tokens arranged "on-the-grid," which biases patches or tokens to encode information at a specific sp

Externí odkaz: http://arxiv.org/abs/2411.05927

Zobrazit plný text záznamu

Report

DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

Autor: Seitzer, Maximilian, van Steenkiste, Sjoerd, Kipf, Thomas, Greff, Klaus, Sajjadi, Mehdi S. M.

Visual understanding of the world goes beyond the semantics and flat structure of individual images. In this work, we aim to capture both the 3D structure and dynamics of real-world scenes from monocular real-world videos. Our Dynamic Scene Transform

Externí odkaz: http://arxiv.org/abs/2310.06020

Zobrazit plný text záznamu

Report

DORSal: Diffusion for Object-centric Representations of Scenes et al

Autor: Jabri, Allan, van Steenkiste, Sjoerd, Hoogeboom, Emiel, Sajjadi, Mehdi S. M., Kipf, Thomas

Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of in

Externí odkaz: http://arxiv.org/abs/2306.08068

Zobrazit plný text záznamu

Report

Sensitivity of Slot-Based Object-Centric Models to their Number of Slots

Autor: Zimmermann, Roland S., van Steenkiste, Sjoerd, Sajjadi, Mehdi S. M., Kipf, Thomas, Greff, Klaus

Self-supervised methods for learning object-centric representations have recently been applied successfully to various datasets. This progress is largely fueled by slot-based methods, whose ability to cluster visual scenes into meaningful objects hol

Externí odkaz: http://arxiv.org/abs/2305.18890

Zobrazit plný text záznamu

Report

RePAST: Relative Pose Attention Scene Representation Transformer

Autor: Safin, Aleksandr, Duckworth, Daniel, Sajjadi, Mehdi S. M.

The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a re

Externí odkaz: http://arxiv.org/abs/2304.00947

Zobrazit plný text záznamu

Report

NVAutoNet: Fast and Accurate 360$^{\circ}$ 3D Visual Perception For Self Driving

Autor: Pham, Trung, Maghoumi, Mehran, Jiang, Wanli, Jujjavarapu, Bala Siva Sashank, Sajjadi, Mehdi, Liu, Xin, Lin, Hsuan-Chu, Chen, Bor-Jeng, Truong, Giang, Fang, Chao, Kwon, Junghyun, Park, Minwoo

Robust, real-time perception of 3D world is essential to the autonomous vehicle. We introduce an end-to-end surround camera perception system, named NVAutoNet, for self-driving. NVAutoNet is a multi-task, multi-camera network which takes a variable s

Externí odkaz: http://arxiv.org/abs/2303.12976

Zobrazit plný text záznamu

Report

PaLM-E: An Embodied Multimodal Language Model

Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding. We propose embodied language models to directly incorporate real-worl

Externí odkaz: http://arxiv.org/abs/2303.03378

Zobrazit plný text záznamu

Report

Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames

Autor: Biza, Ondrej, van Steenkiste, Sjoerd, Sajjadi, Mehdi S. M., Elsayed, Gamaleldin F., Mahendran, Aravindh, Kipf, Thomas

Automatically discovering composable abstractions from raw perceptual data is a long-standing challenge in machine learning. Recent slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress in this di

Externí odkaz: http://arxiv.org/abs/2302.04973

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání