Zobrazeno 1 - 10
of 54
pro vyhledávání: '"van Steenkiste, Sjoerd"'
Large language models are increasingly trained on corpora containing both natural language and non-linguistic data like source code. Aside from aiding programming-related tasks, anecdotal evidence suggests that including code in pretraining corpora m
Externí odkaz:
http://arxiv.org/abs/2409.04556
Autor:
Nayak, Shravan, Jain, Kanishk, Awal, Rabiul, Reddy, Siva, van Steenkiste, Sjoerd, Hendricks, Lisa Anne, Stańczak, Karolina, Agrawal, Aishwarya
Foundation models and vision-language pre-training have notably advanced Vision Language Models (VLMs), enabling multimodal processing of visual and linguistic data. However, their performance has been typically assessed on general scene understandin
Externí odkaz:
http://arxiv.org/abs/2407.10920
Autor:
Wu, Ziyi, Rubanova, Yulia, Kabra, Rishabh, Hudson, Drew A., Gilitschenski, Igor, Aytar, Yusuf, van Steenkiste, Sjoerd, Allen, Kelsey R., Kipf, Thomas
We address the problem of multi-object 3D pose control in image diffusion models. Instead of conditioning on a sequence of text tokens, we propose to use a set of per-object representations, Neural Assets, to control the 3D pose of individual objects
Externí odkaz:
http://arxiv.org/abs/2406.09292
Autor:
Eisape, Tiwalayo, Tessler, MH, Dasgupta, Ishita, Sha, Fei, van Steenkiste, Sjoerd, Linzen, Tal
A central component of rational behavior is logical inference: the process of determining which conclusions follow from a set of premises. Psychologists have documented several ways in which humans' inferences deviate from the rules of logic. Do lang
Externí odkaz:
http://arxiv.org/abs/2311.00445
Autor:
Petty, Jackson, van Steenkiste, Sjoerd, Dasgupta, Ishita, Sha, Fei, Garrette, Dan, Linzen, Tal
To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the hypothesis, mo
Externí odkaz:
http://arxiv.org/abs/2310.19956
Autor:
Seitzer, Maximilian, van Steenkiste, Sjoerd, Kipf, Thomas, Greff, Klaus, Sajjadi, Mehdi S. M.
Visual understanding of the world goes beyond the semantics and flat structure of individual images. In this work, we aim to capture both the 3D structure and dynamics of real-world scenes from monocular real-world videos. Our Dynamic Scene Transform
Externí odkaz:
http://arxiv.org/abs/2310.06020
Recent progress in 3D scene understanding enables scalable learning of representations across large datasets of diverse scenes. As a consequence, generalization to unseen scenes and objects, rendering novel views from just a single or a handful of in
Externí odkaz:
http://arxiv.org/abs/2306.08068
Autor:
Zimmermann, Roland S., van Steenkiste, Sjoerd, Sajjadi, Mehdi S. M., Kipf, Thomas, Greff, Klaus
Self-supervised methods for learning object-centric representations have recently been applied successfully to various datasets. This progress is largely fueled by slot-based methods, whose ability to cluster visual scenes into meaningful objects hol
Externí odkaz:
http://arxiv.org/abs/2305.18890
Autor:
Dehghani, Mostafa, Djolonga, Josip, Mustafa, Basil, Padlewski, Piotr, Heek, Jonathan, Gilmer, Justin, Steiner, Andreas, Caron, Mathilde, Geirhos, Robert, Alabdulmohsin, Ibrahim, Jenatton, Rodolphe, Beyer, Lucas, Tschannen, Michael, Arnab, Anurag, Wang, Xiao, Riquelme, Carlos, Minderer, Matthias, Puigcerver, Joan, Evci, Utku, Kumar, Manoj, van Steenkiste, Sjoerd, Elsayed, Gamaleldin F., Mahendran, Aravindh, Yu, Fisher, Oliver, Avital, Huot, Fantine, Bastings, Jasmijn, Collier, Mark Patrick, Gritsenko, Alexey, Birodkar, Vighnesh, Vasconcelos, Cristina, Tay, Yi, Mensink, Thomas, Kolesnikov, Alexander, Pavetić, Filip, Tran, Dustin, Kipf, Thomas, Lučić, Mario, Zhai, Xiaohua, Keysers, Daniel, Harmsen, Jeremiah, Houlsby, Neil
The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image an
Externí odkaz:
http://arxiv.org/abs/2302.05442
Autor:
Biza, Ondrej, van Steenkiste, Sjoerd, Sajjadi, Mehdi S. M., Elsayed, Gamaleldin F., Mahendran, Aravindh, Kipf, Thomas
Automatically discovering composable abstractions from raw perceptual data is a long-standing challenge in machine learning. Recent slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress in this di
Externí odkaz:
http://arxiv.org/abs/2302.04973